


Just to give you an update...

# Mon, 2019-02-18

I updated my head node with a new bug fix provided by DPM.

And started a huge cleaning operation because CMS had more than 1PB of 
data here. A

# Wed

I noticed that xroot was still crashing and transfers were very slow.

I started adjusting parameters both in DOME and the DB node.

In the process, I noticed that the restarting the DB (runs in a separate 
node) can crah xroot. DPM devel investigating.

# Yesterday

Petr Vocak updated his storage to 1.11and started seeing DOME stopping 
frequently. He runs an Atlas sites.

Also Frederic Schaer reported on crashes in xroot.

So... now I'm not alone.

I decide to tune more my DB:

   - I accept now 4 times the number of connections that DPM hints page 

  - I have multiplied by 5 the size of two of the pools used by DOME.

Possibly a coincidence, but I've had no xroot crashes for 17 hours.

# Today, 6am

I reported on the xroot vs BD crashes and new parameters.

DPM started building a new image.

Matt reported on observing some pattern in his storage  which is exactly 
the same that has sent Brunel to the CMS /waiting room. This/

/Hello all, //
//We're seeing some odd behaviour with atlas jobs using rucio to upload 
their files to our site. The pattern atlas are seeing is that rucio 
uploads a file, which succeeds, then immediately checks that the file is 
in place, which fails. It then tries to re-upload, which fails as the 
file exists./


# 1 hour ago

I have a new update in place that was released by DPM devel at 11 am.



To unsubscribe from the GRIDPP-STORAGE list, click the following link: