On 06/11/2017 16:48, Matt Doidge wrote:
> Hello all,
> It's the first Monday of the month, so it's time to look at all the
> tickets. Just when every site in the UK gets an IPv6 ticket...
>
> 43 Open UK Tickets this month
>
> Munging together all 15 "IPv6 deployment at WLCG Tier-2 Sites", I
> notice that only 5 have been acknowledged so far- but then the tickets
> did land on a Friday. We can discuss these in the meeting.
>
> Storage Accounting Deployment
> These were the other big blob of tickets to land in the last week, for
> Oxford, Birmingham, Durham, Manchester and Brunel. The Brunel ticket
> mentions the latest monitoring page for this:
> http://goc-accounting.grid-support.ac.uk/storagetest/storagesitesystems.html
>
>
> Now to go over the 23 regular tickets, site by site.
>
> SUSSEX
> https://ggus.eu/?mode=ticket_info&ticket_id=122772 (11/7/16)
> The webdav/xroot ticket - after rebuilding the system from scratch and
> getting help from Dan it looks like xroot still isn't playing ball.
> The last update has a few questions in it that could with some storage
> experts to weigh in on. In progress (18/10)
>
> RALPP
> https://ggus.eu/?mode=ticket_info&ticket_id=131328 (25/10)
> CMS "low hammercloud xroot success rate" ticket. Chris has been
> working hard on this, but is left in need of some answers looking at
> his last post. Waiting for reply (30/10)
>
> https://ggus.eu/?mode=ticket_info&ticket_id=131565 (2/11)
> A CMS ticket for local stageout failures, due to the "unmerged" area
> filling up by the looks of it. Chris increased the size of this area,
> and asked some questions on quotas. Waiting for reply (2/11)
>
> https://ggus.eu/?mode=ticket_info&ticket_id=130264 (28/8)
> Biomed ticket about invalid publishing from the RALPP CEs. The problem
> seems to have fixed itself despite no joy on the matching Brunel
> ticket (130263) and Chris not doing anything. Chris asks if Biomed
> wants to still track the issue. Waiting for reply (30/10)
>
> OXFORD
> https://ggus.eu/?mode=ticket_info&ticket_id=129931 (4/8)
> Failing atlas http SAM tests at Oxford. Have you tried to upgrade your
> DPM yet? On hold (19/9)
>
> CAMBRIDGE
> https://ggus.eu/?mode=ticket_info&ticket_id=130787 (28/9)
> LHCB jobs dying at Cambridge, John tried to fix the problem by upping
> the CPU limit, but the ticket needs feedback from LHCB. Waiting for
> reply (6/11)
>
This isn't quite correct - I volunteered to do so, but was told this
wasn't going to help. I *am* tempted to close this ticket if LHCb don't
respond very soon.
John
> BRISTOL
> https://ggus.eu/?mode=ticket_info&ticket_id=131590 (3/11)
> A Friday ticket from CMS, regarding network links or something (I
> still don't speak CMS). Assigned (3/11)
>
> https://ggus.eu/?mode=ticket_info&ticket_id=131641 (6/11)
> A fresh CMS ticket, that could be related to the previous one - this
> is about connection problems killing transfers ("connection limit
> exceeded" errors). Assigned (6/11)
>
> BIRMINGHAM
> https://ggus.eu/?mode=ticket_info&ticket_id=129930 (4/8)
> Failing atlas http SAM tests at Birmingham. Mark put in a handy update
> this morning, with plans to reinstall in the next couple of weeks to
> see if that helps. Thanks Mark! On hold (6/11)
>
> SHEFFIELD
> https://ggus.eu/?mode=ticket_info&ticket_id=131472 (31/10)
> Atlas transfers having trouble at Sheffield. Elena notes that she is
> working on balancing disk servers, which we all know is slow work. In
> progress (31/10)
>
> MANCHESTER
> https://ggus.eu/?mode=ticket_info&ticket_id=131171 (18/10)
> Atlas VAC jobs failing at Manchester - this has been discussed heavily
> on lists and in the atlas uk meetings, and I think some conclusions
> have been made? Or did I dream that? In progress (24/10)
>
> LIVERPOOL
> https://ggus.eu/?mode=ticket_info&ticket_id=131623 (4/11)
> Atlas deletion error ticket - Steve is on it, the problem being on one
> of the disk servers. In progress (6/11)
>
> QMUL
> https://ggus.eu/?mode=ticket_info&ticket_id=130262 (28/8)
> Biomed complaining that the QM Storm SE publishing invalid glue2 data.
> Daniel spotted that this is due to storm not publishing glue2 data at
> all. Biomed wants to know if there's a ticket about this - perhaps we
> should suggest they submit one? In progress (27/10)
>
> IMPERIAL
> (who win kudos for already closing their IPv6 tickets)
>
> https://ggus.eu/?mode=ticket_info&ticket_id=131126 (16/10
> Debugging CMS job problems, after some digging it looks like the jobs
> are having problems accessing files that they should be able to
> without issue. A mystery. Waiting for reply (is this still the right
> status?) (2/11)
>
> https://ggus.eu/?mode=ticket_info&ticket_id=131663 (6/11)
> A fresh ticket from Brian, asking to check on the status of a file.
> Assigned (6/11)
>
> BRUNEL
> https://ggus.eu/?mode=ticket_info&ticket_id=130263 (28/8)
> The other Biomed publishing ticket, waiting on the ARC devs to patch
> the patch that Raul has been trying out. On hold (13/10)
>
> TIER 1
> https://ggus.eu/?mode=ticket_info&ticket_id=131652 (6/11)
> Jobs failing with gfal-copy errors, although Brian hasn't been able to
> replicate them. In progress (6/11)
>
> https://ggus.eu/?mode=ticket_info&ticket_id=131213 (19/10)
> CMS having issues with fallback requests to RAL, tracked down to some
> dodgy xrootd servers which were restarted. Waiting on hearing if
> things are fixed. Waiting for reply (23/10)
>
> https://ggus.eu/?mode=ticket_info&ticket_id=131299 (24/10)
> CMS Hammercloud failure ticket. A hint has been dropped that the error
> message might be due to root CAs being off on the ECHO server. Has
> this been checked? In progress (24/10)
>
> https://ggus.eu/?mode=ticket_info&ticket_id=130949 (6/10)
> CMS transfers failing to RAL disk, the root problem caused by their
> being no room at the disk servers! Chris has been helping, finding
> about 100TB unaccounted for. Any luck generating those file lists? In
> progress (25/10)
>
> https://ggus.eu/?mode=ticket_info&ticket_id=130207 (24/8)
> A MICE ticket regarding Castor, I think all the issues have been
> solved, this ticket is just being left open whilst "new" disk servers
> are freed up to go into the disk pool. How goes that process? On hold
> (25/10)
>
> https://ggus.eu/?mode=ticket_info&ticket_id=127597 (7/4)
> A CMS request to check the RAL networking. Gareth updated the ticket a
> few weeks back with news on the state of the firewall. On hold (5/10)
>
> https://ggus.eu/?mode=ticket_info&ticket_id=124876 (7/11/16)
> ECHO gridftp ops tests failing, due to the tests not having the right
> path in them. Alastair has poked the ticket to get the tests fixed
> (125026)
>
> https://ggus.eu/?mode=ticket_info&ticket_id=117683 (18/11/15)
> Castor not publishing glue 2. This is being worked on slowly in the
> background, but the ticket could do with a quarterly update. On hold
> (6/7)
>
> And that's all folks! Hopefully I didn't miss any tickets out.
>
> Cheers,
> Matt
|