What batch system do you use, Simon?
The CREAM9999999 ID is the key info.
I remember from our CREAM/Torque days that there are some
/var/log/cream/ logs where CREAM puts info on what CREAM job id maps to
which Batch System job ID.
I still have the scripts I used to filter the info (see below). The
first script (cream_tracer.pl) takes a piped in cream log and associates
together the sequence of messages for jobs, printing them out in blocks.
The second script, cream_filter.pl, takes the output from the first
script. You also give in a command line "key" parameter (or more than
one) and it restricts the output to job info that contains one of those
keys.
In the output will be the batch system ID. From memory, they worked like
this:
cat <SOMECREALOGFILE> | ./cream_tracer.pl | ./cream_filter.pl 791629196
Where 791629196 is the cream id number from the file name.
Cheers,
Ste
--- SCRIPTS ---
1) cream_tracer.pl
#!/usr/bin/perl
use strict;
my %messageBlocks;
my @messageBlocksOrder;
while (<STDIN>) {
my $line = $_;
chomp($line);
if ($line =~ /Job inserted. JobId = (CREAM\d+)/) {
my $creamJobId = $1;
$messageBlocks{$creamJobId} = [];
push(@{$messageBlocks{$creamJobId}},$line);
push(@messageBlocksOrder,$creamJobId);
}
else {
if ($line =~ /(CREAM\d+)/) {
my $cid = $1;
if (defined($messageBlocks{$cid})) {
push(@{$messageBlocks{$cid}},$line);
}
}
}
}
# Print the jobs
foreach my $cid (@messageBlocksOrder) {
my @msgs = @{$messageBlocks{$cid}};
print("\nCREAM Records for $cid\n");
foreach my $l (@msgs) {
print ("$l\n");
}
}
2) cream_filter.pl
#!/usr/bin/perl
use strict;
my @terms = @ARGV;
my %blocks;
my $name;
while(<STDIN>) {
my $l = $_;
if ($l =~ /CREAM Records for.*\s(CREAM\d+)/) {
$name = $1;
$blocks{$name} = [];
push(@{$blocks{$name}},$l);
}
else {
if (defined($blocks{$name})) {
push(@{$blocks{$name}},$l);
}
}
}
foreach my $n (keys(%blocks)) {
my @block = @{$blocks{$n}};
my $match = 0;
foreach my $l (@block) {
foreach my $t (@terms) {
if ($l =~ /.*$t.*/) {
$match++;
}
}
}
if ($match) {
foreach my $l (@block) {
print($l);
}
}
}
On 13/08/18 10:59, George, Simon wrote:
>
> Thanks Alastair.
>
> To you or any other helpful person, how to I get to a batch ID from a
> cream output file path?
>
>
>
> ------------------------------------------------------------------------
> *From:* Testbed Support for GridPP member institutes
> <[log in to unmask]> on behalf of Alastair Dewhurst
> <[log in to unmask]>
> *Sent:* 13 August 2018 10:01
> *To:* [log in to unmask]
> *Subject:* Re: atlas job filling up /var/cream_sandbox/atlaspil/ with
> huge log file
> Hi
>
> For any ATLAS issue always cc in UK cloud support, if anyone is going
> to do anything about this it will be someone on there. They are all
> in TB-support anyway, but its always a good idea to make sure it isn’t
> missed.
>
> You should also always provide a batch farm ID of the jobs. In the
> Panda monitor, you can search for jobs via batchID and this will then
> say who the user that actually submitted the jobs was etc, so they
> could be stopped from submitting more (by ATLAS).
>
> I am afraid I can’t help you with blocking them at the CREAM level.
>
> Alastair
>
>
>> On 13 Aug 2018, at 09:53, George, Simon <[log in to unmask]
>> <mailto:[log in to unmask]>> wrote:
>>
>> Thanks very much.
>> Done:
>> https://ggus.eu/index.php?mode=ticket_info&ticket_id=136675&come_from=submit
>> I wonder if anyone else has seen jobs like these?
>>
>>
>> ------------------------------------------------------------------------
>> *From:*Testbed Support for GridPP member institutes
>> <[log in to unmask] <mailto:[log in to unmask]>> on
>> behalf of Daniela Bauer
>> <[log in to unmask]
>> <mailto:[log in to unmask]>>
>> *Sent:*13 August 2018 09:15
>> *To:*[log in to unmask] <mailto:[log in to unmask]>
>> *Subject:*Re: atlas job filling up /var/cream_sandbox/atlaspil/ with
>> huge log file
>> ggus.org <http://ggus.org>, pick "VO specific" and chose Atlas. And
>> make it "urgent".
>> Cheers,
>> Daniela
>>
>> On Mon, 13 Aug 2018 at 09:10, George, Simon <[log in to unmask]
>> <mailto:[log in to unmask]>> wrote:
>> >
>> > Thanks Daniela.
>> >
>> > How do I submit a VO ticket? Can you point me to the relevant
>> ticketing system/email address please?
>> >
>> >
>> >
>> > ________________________________
>> > From: Testbed Support for GridPP member institutes
>> <[log in to unmask] <mailto:[log in to unmask]>> on
>> behalf of Daniela Bauer
>> <[log in to unmask]
>> <mailto:[log in to unmask]>>
>> > Sent: 13 August 2018 09:05
>> > To: [log in to unmask] <mailto:[log in to unmask]>
>> > Subject: Re: atlas job filling up /var/cream_sandbox/atlaspil/ with
>> huge log file
>> >
>> > Atlas specific VO ticket ? At least then the other sites can see it.
>> >
>> > Cheers,
>> > Daniela
>> > On Mon, 13 Aug 2018 at 09:02, George, Simon <[log in to unmask]
>> <mailto:[log in to unmask]>> wrote:
>> > >
>> > > Since last night I had two more jobs with the same problem.
>> > >
>> > >
>> > >
>> > > ________________________________
>> > > From: Testbed Support for GridPP member institutes
>> <[log in to unmask] <mailto:[log in to unmask]>> on
>> behalf of George, Simon <[log in to unmask]
>> <mailto:[log in to unmask]>>
>> > > Sent: 12 August 2018 22:34
>> > > To: [log in to unmask] <mailto:[log in to unmask]>
>> > > Subject: atlas job filling up /var/cream_sandbox/atlaspil/ with
>> huge log file
>> > >
>> > >
>> > > Hi,
>> > >
>> > > I have a rogue ATLAS jobs that has broken up one of my CREAM CEs
>> with a 10 GB log file in a pilot directory,
>> > >
>> > >
>> /var/cream_sandbox/atlaspil/CN_Robot__ATLAS_Pilot2_CN_531497_CN_atlpilo2_OU_Users_OU_Organic_Units_DC_cern_DC_ch_atlas_Role_pilot_Capability_NULL_platl017/79/CREAM791629196/OSB/4998630.20.out
>> > >
>> > >
>> > > The log file at some point starts listing the files unpacked from
>> a tar ball like this:
>> > >
>> > >
>> tarball_PandaJob_4024428114_ANALY_RHUL_SL6/usr/DiLepAna/1.0.0/InstallArea/x86_64-slc6-gcc62-opt/include/AnalysisHelpers/AnalysisHelpers/AnalysisHelpers
>> > >
>> > >
>> tarball_PandaJob_4024428114_ANALY_RHUL_SL6/usr/DiLepAna/1.0.0/InstallArea/x86_64-slc6-gcc62-opt/include/AnalysisHelpers/AnalysisHelpers/AnalysisHelpers/AnalysisHelpers
>> > >
>> > >
>> tarball_PandaJob_4024428114_ANALY_RHUL_SL6/usr/DiLepAna/1.0.0/InstallArea/x86_64-slc6-gcc62-opt/include/AnalysisHelpers/AnalysisHelpers/AnalysisHelpers/AnalysisHelpers/AnalysisHelpers
>> > >
>> > > and carries on like that forever, adding another /AnalysisHelpers
>> at the end each time, soon changing to an error message
>> > >
>> > > "Cannot stat: Too many levels of symbolic links" each time.
>> > > There is clearly some kind of simlink loop in this tar file.
>> > > Perhaps some of you see other jobs with the same problem at your
>> sites?
>> > >
>> > > This log file is up to around 10 GB and has filled my /var up, so
>> I will copy the first 5000 lines in case it's useful and delete it.
>> > >
>> > >
>> > > Can anyone advise if/how to report it to ATLAS?
>> > >
>> > > And how can I identify and stop this job?
>> > >
>> > > (Apologies, I do not know my way around cream.)
>> > >
>> > >
>> > > Thanks,
>> > >
>> > > Simon
>> > >
>> > >
>> > > ________________________________
>> > >
>> > > To unsubscribe from the TB-SUPPORT list, click the following link:
>> > >https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
>> > >
>> > >
>> > > ________________________________
>> > >
>> > > To unsubscribe from the TB-SUPPORT list, click the following link:
>> > >https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
>> >
>> >
>> >
>> > --
>> > Sent from the pit of despair
>> >
>> > -----------------------------------------------------------
>> > [log in to unmask] <mailto:[log in to unmask]>
>> > HEP Group/Physics Dep
>> > Imperial College
>> > London, SW7 2BW
>> > Tel: +44-(0)20-75947810
>> >http://www.hep.ph.ic.ac.uk/~dbauer/
>> <http://www.hep.ph.ic.ac.uk/%7Edbauer/>
>> >
>> >
>> ########################################################################
>> >
>> > To unsubscribe from the TB-SUPPORT list, click the following link:
>> >https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
>> >
>> > ________________________________
>> >
>> > To unsubscribe from the TB-SUPPORT list, click the following link:
>> >https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
>>
>>
>>
>> --
>> Sent from the pit of despair
>>
>> -----------------------------------------------------------
>> [log in to unmask] <mailto:[log in to unmask]>
>> HEP Group/Physics Dep
>> Imperial College
>> London, SW7 2BW
>> Tel: +44-(0)20-75947810
>> http://www.hep.ph.ic.ac.uk/~dbauer/
>> <http://www.hep.ph.ic.ac.uk/%7Edbauer/>
>>
>> ########################################################################
>>
>> To unsubscribe from the TB-SUPPORT list, click the following link:
>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
>>
>> ------------------------------------------------------------------------
>> To unsubscribe from the TB-SUPPORT list, click the following link:
>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
>
>
> ------------------------------------------------------------------------
>
> To unsubscribe from the TB-SUPPORT list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
>
>
> ------------------------------------------------------------------------
>
> To unsubscribe from the TB-SUPPORT list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
>
--
Steve Jones [log in to unmask]
Grid System Administrator office: 220
High Energy Physics Division tel (int): 43396
Oliver Lodge Laboratory tel (ext): +44 (0)151 794 3396
University of Liverpool http://www.liv.ac.uk/physics/hep/
########################################################################
To unsubscribe from the TB-SUPPORT list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
|