Hi Luke
According to the perfsonar/maddash results, there is packet loss inbound
to Bristol from most UK sites:
http://netmon02.grid.hep.ph.ic.ac.uk:8080/maddash-webui/index.cgi?grid=UK%20sites%20-%20UK%20Cloud%20OWAMP%20Mesh%20Test
Do you know if your WAN link is heavily loaded?
RAL seems to have its own issues:
http://netmon02.grid.hep.ph.ic.ac.uk:8080/maddash-webui/details.cgi?uri=/maddash/grids/UK+sites+-+UK+Cloud+OWAMP+Mesh+Test/lcgps01.gridpp.rl.ac.uk/perfsonar-lt.tier2.hep.manchester.ac.uk/Loss
Duncan
On 08/08/14 10:42, L Kreczko wrote:
> Dear experts,
>
> It seems that Bristol is suffering connection problems to mostly, but
> not exclusively, to US sites.
> As an example between Bristol <-> FNAL shows in perfsonar a forward
> direction packet loss of 0.02% and a reverse direction packet loss of
> 0.22%. However, I do see peaks every second day of up to 90% packet
> loss [1]! Also, the throughput from Bristol to FNAL (src-dst) is
> relatively stable at around 83 MB/s while the throughput FNAL to
> Bristol (dst-src) varies widely between 580 and 2 MB/s [2].
>
> Whatever the problem it manifests itself in (at least) three different
> instances:
> - timeouts for phedex transfers
> (https://ggus.eu/index.php?mode=ticket_info&ticket_id=106554)
> - job connection loss
> (https://ggus.eu/index.php?mode=ticket_info&ticket_id=106325)
> - crab submission problems [3]
>
> We do not see the similar issues with UK sites (unless I missed something).
>
> Also, if perfsonar is an indication of anything, it seems that all UK
> sites have some sort of problem with FNAL:
> http://netmon02.grid.hep.ph.ic.ac.uk:8080/maddash-webui/index.cgi?grid=UK%20sites%20-%20intercloud%20OWAMP%20Mesh%20Test
>
> Could you please advice on how to proceed?
>
> Cheers,
> Luke
>
>
> [1]
> https://lcgnetmon.phy.bris.ac.uk/serviceTest/delayGraph.cgi?url=http://localhost:8085/perfSONAR_PS/services/pSB&key=bb51cf4a6c0119b89f465a8ee0b7889a&keyR=61401bff68207fefd5b98202a1542bb4&dstIP=131.225.205.141&srcIP=137.222.171.35&dst=psonar2.fnal.gov&src=lcgnetmon.phy.bris.ac.uk&type=TCP&length=604800&bucket_width=0.001
>
> [2]
> http://lcgnetmon02.phy.bris.ac.uk/serviceTest/bandwidthGraph.cgi?url=http://localhost:8085/perfSONAR_PS/services/pSB&key=a659b714989b284caa0e239104ddb846&keyR=fa7b25f3b9a306f0119865ced4a6c01c&dstIP=131.225.205.139&srcIP=137.222.171.39&dst=psonar1.fnal.gov&src=lcgnetmon02.phy.bris.ac.uk&type=TCP&length=2592000
>
> [3]
> crab: Checking available resources...
> crab: Found compatible site(s) for job 1
> crab: 1 blocks of jobs will be submitted
> crab: serverName from Task DB is submit-6.t2.ucsd.edu
> crab: contacting remote host submit-6.t2.ucsd.edu
> crab: COPY FILES TO REMOTE HOST
> crab: SUBMIT TO REMOTE GLIDEIN FRONTEND
> crab: Job not submitted
> crab: Submitting job(s)
> ERROR: Failed to connect to local queue manager
> SECMAN:2007:Failed to end classad message.
> CONDOR_SUBMIT-EXIT-STATUS IS 1
>
> [4] Bonus for everyone that reads the whole email: RAL <-> Bristol:
> http://lcgnetmon02.phy.bris.ac.uk/serviceTest/bandwidthGraph.cgi?url=http://localhost:8085/perfSONAR_PS/services/pSB&key=55c6d7e4529a5ccbb6a0d1037d7d7542&keyR=049462e30858003ffde129e8e2980d08&dstIP=137.222.171.39&srcIP=130.246.176.110&dst=lcgnetmon02.phy.bris.ac.uk&src=lcgps02.gridpp.rl.ac.uk&type=TCP&length=2592000
>
>
|