Since last night I had two more jobs with the same problem.
Hi,
I have a rogue ATLAS jobs that has broken up one of my CREAM CEs with a 10 GB log file in a pilot directory,
/var/cream_sandbox/atlaspil/CN_Robot__ATLAS_Pilot2_CN_531497_CN_atlpilo2_OU_Users_OU_Organic_Units_DC_cern_DC_ch_atlas_Role_pilot_Capability_NULL_platl017/79/CREAM791629196/OSB/4998630.20.out
The log file at some point starts listing the files unpacked from a tar ball like this:
tarball_PandaJob_4024428114_ANALY_RHUL_SL6/usr/DiLepAna/1.0.0/InstallArea/x86_64-slc6-gcc62-opt/include/AnalysisHelpers/AnalysisHelpers/AnalysisHelpers
tarball_PandaJob_4024428114_ANALY_RHUL_SL6/usr/DiLepAna/1.0.0/InstallArea/x86_64-slc6-gcc62-opt/include/AnalysisHelpers/AnalysisHelpers/AnalysisHelpers/AnalysisHelpers
tarball_PandaJob_4024428114_ANALY_RHUL_SL6/usr/DiLepAna/1.0.0/InstallArea/x86_64-slc6-gcc62-opt/include/AnalysisHelpers/AnalysisHelpers/AnalysisHelpers/AnalysisHelpers/AnalysisHelpers
and carries on like that forever, adding another /AnalysisHelpers at the end each time, soon changing to an error message
This log file is up to around 10 GB and has filled my /var up, so I will copy the first 5000 lines in case it's useful and delete it.
Can anyone advise if/how to report it to ATLAS?
And how can I identify and stop this job?
(Apologies, I do not know my way around cream.)
Thanks,
Simon
To unsubscribe from the TB-SUPPORT list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
To unsubscribe from the TB-SUPPORT list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1