Print

Print


No that ticket looks like it's been forgotten :-(
On Tue, 14 Aug 2018 at 12:01, George, Simon <[log in to unmask]> wrote:
>
> Thanks Jeremy.
>
> I'm surprised there is no response at all on my ticket. Does it look correct to you? Assigned to "TPM"; I don't see any way to assign it to a VO.
>
> Thankfully the problem seems to have gone away for now.
>
>
>
> ________________________________
> From: Testbed Support for GridPP member institutes <[log in to unmask]> on behalf of Jeremy Coles <[log in to unmask]>
> Sent: 14 August 2018 10:30
> To: [log in to unmask]
> Subject: Re: atlas job filling up /var/cream_sandbox/atlaspil/ with huge log file
>
> Simon/All,
>
> In yesterday’s WLCG ops meeting KIT reported something that looks similar:
>
> "Several ARC-CEs disabled by jobs with very large output files each (17G-22G) filling up disks for input/output staging”.
>
> I’m looking for their ticket.
>
> Jeremy
>
>
>
>
>
> > On 13 Aug 2018, at 15:36, Stephen Jones <[log in to unmask]> wrote:
> >
> > What batch system do you use, Simon?
> >
> > The CREAM9999999 ID is the key info.
> >
> > I remember from our CREAM/Torque  days that there are some /var/log/cream/ logs where CREAM puts info on what CREAM job id maps to which Batch System job ID.
> >
> > I still have the scripts I used to filter the info (see below). The first script (cream_tracer.pl) takes a piped in cream log and associates together the sequence of messages for jobs, printing them out in blocks.
> >
> > The second script, cream_filter.pl, takes the output from the first script. You also give in a command line "key" parameter (or more than one) and it restricts the output to job info that contains one of those keys.
> >
> > In the output will be the batch system ID. From memory, they worked like this:
> >
> > cat <SOMECREALOGFILE> |  ./cream_tracer.pl | ./cream_filter.pl 791629196
> >
> > Where 791629196 is the cream id number from the file name.
> >
> > Cheers,
> >
> > Ste
> >
> > --- SCRIPTS ---
> >
> > 1) cream_tracer.pl
> >
> > #!/usr/bin/perl
> > use strict;
> >
> > my %messageBlocks;
> > my @messageBlocksOrder;
> >
> > while (<STDIN>) {
> >   my $line = $_;
> >   chomp($line);
> >
> >   if ($line =~ /Job inserted. JobId = (CREAM\d+)/) {
> >     my $creamJobId = $1;
> >     $messageBlocks{$creamJobId} = [];
> >     push(@{$messageBlocks{$creamJobId}},$line);
> >     push(@messageBlocksOrder,$creamJobId);
> >   }
> >   else {
> >     if ($line =~ /(CREAM\d+)/) {
> >       my $cid = $1;
> >       if (defined($messageBlocks{$cid})) {
> >         push(@{$messageBlocks{$cid}},$line);
> >       }
> >     }
> >   }
> > }
> >
> > # Print the jobs
> > foreach my $cid (@messageBlocksOrder) {
> >   my @msgs = @{$messageBlocks{$cid}};
> >   print("\nCREAM Records for $cid\n");
> >   foreach my $l (@msgs) {
> >     print ("$l\n");
> >   }
> > }
> >
> > 2) cream_filter.pl
> >
> > #!/usr/bin/perl
> >
> > use strict;
> >
> > my @terms = @ARGV;
> >
> > my %blocks;
> > my $name;
> > while(<STDIN>) {
> >   my $l = $_;
> >   if ($l =~ /CREAM Records for.*\s(CREAM\d+)/) {
> >     $name = $1;
> >     $blocks{$name} = [];
> >     push(@{$blocks{$name}},$l);
> >   }
> >   else {
> >     if (defined($blocks{$name})) {
> >       push(@{$blocks{$name}},$l);
> >     }
> >   }
> > }
> >
> > foreach my $n (keys(%blocks)) {
> >   my @block = @{$blocks{$n}};
> >
> >   my $match = 0;
> >   foreach my $l (@block) {
> >     foreach my $t (@terms) {
> >       if ($l =~ /.*$t.*/) {
> >         $match++;
> >       }
> >     }
> >   }
> >   if ($match) {
> >     foreach my $l (@block) {
> >       print($l);
> >     }
> >   }
> > }
> >
> >
> >
> >
> >
> >
> > On 13/08/18 10:59, George, Simon wrote:
> >>
> >> Thanks Alastair.
> >>
> >> To you or any other helpful person, how to I get to a batch ID from a cream output file path?
> >>
> >>
> >>
> >> ------------------------------------------------------------------------
> >> *From:* Testbed Support for GridPP member institutes <[log in to unmask]> on behalf of Alastair Dewhurst <[log in to unmask]>
> >> *Sent:* 13 August 2018 10:01
> >> *To:* [log in to unmask]
> >> *Subject:* Re: atlas job filling up /var/cream_sandbox/atlaspil/ with huge log file
> >> Hi
> >>
> >> For any ATLAS issue always cc in UK cloud support, if anyone is going to do anything about this it will be someone on there.  They are all in TB-support anyway, but its always a good idea to make sure it isn’t missed.
> >>
> >> You should also always provide a batch farm ID of the jobs. In the Panda monitor, you can search for jobs via batchID and this will then say who the user that actually submitted the jobs was etc, so they could be stopped from submitting more (by ATLAS).
> >>
> >> I am afraid I can’t help you with blocking them at the CREAM level.
> >>
> >> Alastair
> >>
> >>
> >>> On 13 Aug 2018, at 09:53, George, Simon <[log in to unmask] <mailto:[log in to unmask]>> wrote:
> >>>
> >>> Thanks very much.
> >>> Done: https://ggus.eu/index.php?mode=ticket_info&ticket_id=136675&come_from=submit
> >>> I wonder if anyone else has seen jobs like these?
> >>>
> >>>
> >>> ------------------------------------------------------------------------
> >>> *From:*Testbed Support for GridPP member institutes <[log in to unmask] <mailto:[log in to unmask]>> on behalf of Daniela Bauer <[log in to unmask] <mailto:[log in to unmask]>>
> >>> *Sent:*13 August 2018 09:15
> >>> *To:*[log in to unmask] <mailto:[log in to unmask]>
> >>> *Subject:*Re: atlas job filling up /var/cream_sandbox/atlaspil/ with huge log file
> >>> ggus.org <http://ggus.org>, pick "VO specific" and chose Atlas. And make it "urgent".
> >>> Cheers,
> >>> Daniela
> >>>
> >>> On Mon, 13 Aug 2018 at 09:10, George, Simon <[log in to unmask] <mailto:[log in to unmask]>> wrote:
> >>> >
> >>> > Thanks Daniela.
> >>> >
> >>> > How do I submit a VO ticket? Can you point me to the relevant ticketing system/email address please?
> >>> >
> >>> >
> >>> >
> >>> > ________________________________
> >>> > From: Testbed Support for GridPP member institutes <[log in to unmask] <mailto:[log in to unmask]>> on behalf of Daniela Bauer <[log in to unmask] <mailto:[log in to unmask]>>
> >>> > Sent: 13 August 2018 09:05
> >>> > To: [log in to unmask] <mailto:[log in to unmask]>
> >>> > Subject: Re: atlas job filling up /var/cream_sandbox/atlaspil/ with huge log file
> >>> >
> >>> > Atlas specific VO ticket ? At least then the other sites can see it.
> >>> >
> >>> > Cheers,
> >>> > Daniela
> >>> > On Mon, 13 Aug 2018 at 09:02, George, Simon <[log in to unmask] <mailto:[log in to unmask]>> wrote:
> >>> > >
> >>> > > Since last night I had two more jobs with the same problem.
> >>> > >
> >>> > >
> >>> > >
> >>> > > ________________________________
> >>> > > From: Testbed Support for GridPP member institutes <[log in to unmask] <mailto:[log in to unmask]>> on behalf of George, Simon <[log in to unmask] <mailto:[log in to unmask]>>
> >>> > > Sent: 12 August 2018 22:34
> >>> > > To: [log in to unmask] <mailto:[log in to unmask]>
> >>> > > Subject: atlas job filling up /var/cream_sandbox/atlaspil/ with huge log file
> >>> > >
> >>> > >
> >>> > > Hi,
> >>> > >
> >>> > > I have a rogue ATLAS jobs that has broken up one of my CREAM CEs with a 10 GB log file in a pilot directory,
> >>> > >
> >>> > > /var/cream_sandbox/atlaspil/CN_Robot__ATLAS_Pilot2_CN_531497_CN_atlpilo2_OU_Users_OU_Organic_Units_DC_cern_DC_ch_atlas_Role_pilot_Capability_NULL_platl017/79/CREAM791629196/OSB/4998630.20.out
> >>> > >
> >>> > >
> >>> > > The log file at some point starts listing the files unpacked from a tar ball like this:
> >>> > >
> >>> > > tarball_PandaJob_4024428114_ANALY_RHUL_SL6/usr/DiLepAna/1.0.0/InstallArea/x86_64-slc6-gcc62-opt/include/AnalysisHelpers/AnalysisHelpers/AnalysisHelpers
> >>> > >
> >>> > > tarball_PandaJob_4024428114_ANALY_RHUL_SL6/usr/DiLepAna/1.0.0/InstallArea/x86_64-slc6-gcc62-opt/include/AnalysisHelpers/AnalysisHelpers/AnalysisHelpers/AnalysisHelpers
> >>> > >
> >>> > > tarball_PandaJob_4024428114_ANALY_RHUL_SL6/usr/DiLepAna/1.0.0/InstallArea/x86_64-slc6-gcc62-opt/include/AnalysisHelpers/AnalysisHelpers/AnalysisHelpers/AnalysisHelpers/AnalysisHelpers
> >>> > >
> >>> > > and carries on like that forever, adding another /AnalysisHelpers at the end each time, soon changing to an error message
> >>> > >
> >>> > > "Cannot stat: Too many levels of symbolic links" each time.
> >>> > > There is clearly some kind of simlink loop in this tar file.
> >>> > > Perhaps some of you see other jobs with the same problem at your sites?
> >>> > >
> >>> > > This log file is up to around 10 GB and has filled my /var up, so I will copy the first 5000 lines in case it's useful and delete it.
> >>> > >
> >>> > >
> >>> > > Can anyone advise if/how to report it to ATLAS?
> >>> > >
> >>> > > And how can I identify and stop this job?
> >>> > >
> >>> > > (Apologies, I do not know my way around cream.)
> >>> > >
> >>> > >
> >>> > > Thanks,
> >>> > >
> >>> > > Simon
> >>> > >
> >>> > >
> >>> > > ________________________________
> >>> > >
> >>> > > To unsubscribe from the TB-SUPPORT list, click the following link:
> >>> > >https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
> >>> > >
> >>> > >
> >>> > > ________________________________
> >>> > >
> >>> > > To unsubscribe from the TB-SUPPORT list, click the following link:
> >>> > >https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Sent from the pit of despair
> >>> >
> >>> > -----------------------------------------------------------
> >>> > [log in to unmask] <mailto:[log in to unmask]>
> >>> > HEP Group/Physics Dep
> >>> > Imperial College
> >>> > London, SW7 2BW
> >>> > Tel: +44-(0)20-75947810
> >>> >http://www.hep.ph.ic.ac.uk/~dbauer/ <http://www.hep.ph.ic.ac.uk/%7Edbauer/>
> >>> >
> >>> > ########################################################################
> >>> >
> >>> > To unsubscribe from the TB-SUPPORT list, click the following link:
> >>> >https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
> >>> >
> >>> > ________________________________
> >>> >
> >>> > To unsubscribe from the TB-SUPPORT list, click the following link:
> >>> >https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
> >>>
> >>>
> >>>
> >>> --
> >>> Sent from the pit of despair
> >>>
> >>> -----------------------------------------------------------
> >>> [log in to unmask] <mailto:[log in to unmask]>
> >>> HEP Group/Physics Dep
> >>> Imperial College
> >>> London, SW7 2BW
> >>> Tel: +44-(0)20-75947810
> >>> http://www.hep.ph.ic.ac.uk/~dbauer/ <http://www.hep.ph.ic.ac.uk/%7Edbauer/>
> >>>
> >>> ########################################################################
> >>>
> >>> To unsubscribe from the TB-SUPPORT list, click the following link:
> >>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
> >>>
> >>> ------------------------------------------------------------------------
> >>> To unsubscribe from the TB-SUPPORT list, click the following link:
> >>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
> >>
> >>
> >> ------------------------------------------------------------------------
> >>
> >> To unsubscribe from the TB-SUPPORT list, click the following link:
> >> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
> >>
> >>
> >> ------------------------------------------------------------------------
> >>
> >> To unsubscribe from the TB-SUPPORT list, click the following link:
> >> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
> >>
> >
> > --
> > Steve Jones                             [log in to unmask]
> > Grid System Administrator               office: 220
> > High Energy Physics Division            tel (int): 43396
> > Oliver Lodge Laboratory                 tel (ext): +44 (0)151 794 3396
> > University of Liverpool                 http://www.liv.ac.uk/physics/hep/
> >
> > ########################################################################
> >
> > To unsubscribe from the TB-SUPPORT list, click the following link:
> > https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
>
>
> ########################################################################
>
> To unsubscribe from the TB-SUPPORT list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
>
> ________________________________
>
> To unsubscribe from the TB-SUPPORT list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1



-- 
Sent from the pit of despair

-----------------------------------------------------------
[log in to unmask]
HEP Group/Physics Dep
Imperial College
London, SW7 2BW
Tel: +44-(0)20-75947810
http://www.hep.ph.ic.ac.uk/~dbauer/

########################################################################

To unsubscribe from the TB-SUPPORT list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1