Hi Luke
The highest rate I can see you get to sites within the UK is to QMUL with ~0.5 Gbps:
http://netmon02.grid.hep.ph.ic.ac.uk:8080/maddash-webui/details.cgi?uri=/maddash/grids/UK+sites+-+UK+Cloud+BWCTL+Mesh+Test/lcgnetmon02.phy.bris.ac.uk/perfsonar-bandwidth.esc.qmul.ac.uk/Throughput
I understand you have 10 Gpbs to your site and a 1 Gbps NIC on the perfsonar host. It would be great if you could get the NIC upgraded to 10 Gbps as, judging by other UK sites that are connected at 10 Gbps, you should be able to get more like 4-5 Gbps. See for example the green squares here
http://perfsonar-itb.grid.iu.edu/maddash-webui/index.cgi?dashboard=UK%20sites
(this is a similar maddash server to our at Imperial but with different colour scales). Anything > 0.9 Gbps is green so it is an easy way to spot the well-connected (10 Gbps?) sites. At the moment Bristol has no green squares.
Perhaps this is a firewall problem, notice that the data rates to QMUL in the link above are flat at 0.53 Gbps but inbound there is more variability which is what you describe with FNAL. Have a look at slides 22-37 here
https://services.geant.net/edupert/Resources/Documents/20130307-eduPERT-Zurawski.pdf
It would be interesting to know what your firewall is rated to.
cheers
Duncan
On 11 Aug 2014, at 15:22, L Kreczko wrote:
> Hi Duncan,
>
> As far as I can see we are only at around 2 Gbit/s out of 10 Gbit/s.
> I will ask IT services if they have more detailed metrics.
>
> Cheers,
> Luke
>
> On 11 August 2014 15:02, Duncan Rand <[log in to unmask]> wrote:
>> Hi Luke
>>
>> According to the perfsonar/maddash results, there is packet loss inbound to
>> Bristol from most UK sites:
>>
>> http://netmon02.grid.hep.ph.ic.ac.uk:8080/maddash-webui/index.cgi?grid=UK%20sites%20-%20UK%20Cloud%20OWAMP%20Mesh%20Test
>>
>> Do you know if your WAN link is heavily loaded?
>>
>> RAL seems to have its own issues:
>>
>> http://netmon02.grid.hep.ph.ic.ac.uk:8080/maddash-webui/details.cgi?uri=/maddash/grids/UK+sites+-+UK+Cloud+OWAMP+Mesh+Test/lcgps01.gridpp.rl.ac.uk/perfsonar-lt.tier2.hep.manchester.ac.uk/Loss
>>
>> Duncan
>>
>>
>> On 08/08/14 10:42, L Kreczko wrote:
>>>
>>> Dear experts,
>>>
>>> It seems that Bristol is suffering connection problems to mostly, but
>>> not exclusively, to US sites.
>>> As an example between Bristol <-> FNAL shows in perfsonar a forward
>>> direction packet loss of 0.02% and a reverse direction packet loss of
>>> 0.22%. However, I do see peaks every second day of up to 90% packet
>>> loss [1]! Also, the throughput from Bristol to FNAL (src-dst) is
>>> relatively stable at around 83 MB/s while the throughput FNAL to
>>> Bristol (dst-src) varies widely between 580 and 2 MB/s [2].
>>>
>>> Whatever the problem it manifests itself in (at least) three different
>>> instances:
>>> - timeouts for phedex transfers
>>> (https://ggus.eu/index.php?mode=ticket_info&ticket_id=106554)
>>> - job connection loss
>>> (https://ggus.eu/index.php?mode=ticket_info&ticket_id=106325)
>>> - crab submission problems [3]
>>>
>>> We do not see the similar issues with UK sites (unless I missed
>>> something).
>>>
>>> Also, if perfsonar is an indication of anything, it seems that all UK
>>> sites have some sort of problem with FNAL:
>>>
>>> http://netmon02.grid.hep.ph.ic.ac.uk:8080/maddash-webui/index.cgi?grid=UK%20sites%20-%20intercloud%20OWAMP%20Mesh%20Test
>>>
>>> Could you please advice on how to proceed?
>>>
>>> Cheers,
>>> Luke
>>>
>>>
>>> [1]
>>>
>>> https://lcgnetmon.phy.bris.ac.uk/serviceTest/delayGraph.cgi?url=http://localhost:8085/perfSONAR_PS/services/pSB&key=bb51cf4a6c0119b89f465a8ee0b7889a&keyR=61401bff68207fefd5b98202a1542bb4&dstIP=131.225.205.141&srcIP=137.222.171.35&dst=psonar2.fnal.gov&src=lcgnetmon.phy.bris.ac.uk&type=TCP&length=604800&bucket_width=0.001
>>>
>>> [2]
>>>
>>> http://lcgnetmon02.phy.bris.ac.uk/serviceTest/bandwidthGraph.cgi?url=http://localhost:8085/perfSONAR_PS/services/pSB&key=a659b714989b284caa0e239104ddb846&keyR=fa7b25f3b9a306f0119865ced4a6c01c&dstIP=131.225.205.139&srcIP=137.222.171.39&dst=psonar1.fnal.gov&src=lcgnetmon02.phy.bris.ac.uk&type=TCP&length=2592000
>>>
>>> [3]
>>> crab: Checking available resources...
>>> crab: Found compatible site(s) for job 1
>>> crab: 1 blocks of jobs will be submitted
>>> crab: serverName from Task DB is submit-6.t2.ucsd.edu
>>> crab: contacting remote host submit-6.t2.ucsd.edu
>>> crab: COPY FILES TO REMOTE HOST
>>> crab: SUBMIT TO REMOTE GLIDEIN FRONTEND
>>> crab: Job not submitted
>>> crab: Submitting job(s)
>>> ERROR: Failed to connect to local queue manager
>>> SECMAN:2007:Failed to end classad message.
>>> CONDOR_SUBMIT-EXIT-STATUS IS 1
>>>
>>> [4] Bonus for everyone that reads the whole email: RAL <-> Bristol:
>>>
>>> http://lcgnetmon02.phy.bris.ac.uk/serviceTest/bandwidthGraph.cgi?url=http://localhost:8085/perfSONAR_PS/services/pSB&key=55c6d7e4529a5ccbb6a0d1037d7d7542&keyR=049462e30858003ffde129e8e2980d08&dstIP=137.222.171.39&srcIP=130.246.176.110&dst=lcgnetmon02.phy.bris.ac.uk&src=lcgps02.gridpp.rl.ac.uk&type=TCP&length=2592000
>>>
>>>
>>
>
>
>
> --
> *********************************************************
> Dr Lukasz Kreczko +44 (0)117 928 8724
> CMS Group
> School of Physics
> University of Bristol
> *********************************************************
|