Print

Print


Dear All,

A work around has been implemented to resolve the problem affecting the bdii
server.

The problem can be characterized as affecting only the 137.138.152.128/25
network.  The interface card that connects this network was improperly
populating the forwarding table for certain network flows defined by IP
address, protocol, port and TOS.  This malfunction prevented the servers
within this network from sending out packets matching these improperly
(dynamically) generated forwarding entries.

The forwarding mode configuration on this interface was modified to allow
proper forwarding of packets.  However, this is not consistent with CERN's
standard network configuration.  This problem will be further investigated
to obtain more detailed information.  If the network team will most likely
require addition intervention, we will announce a maintenance period.  

Best Regards,
Min 

-----Original Message-----
From: LHC Computer Grid - Rollout [mailto:[log in to unmask]] On
Behalf Of Jiri Kosina
Sent: Monday, February 21, 2005 5:20 PM
To: [log in to unmask]
Subject: Re: [LCG-ROLLOUT] CERN BDII

On Mon, 21 Feb 2005, Dimitris Zilaskos wrote:

> From our site , we can ping and telnet to the RB/BDII in CERN from all
> our nodes , except from our CE ( node001.grid.auth.gr) , which can
> communicate with lxn1188.cern.ch/lxn1182.cern.ch with udp and icmp ,but
> not with TCP.It appears the packets are blocked by a firewall.

This is very interesting observation. And the more interesting thing is,
that we have the exactly same situation on our farms:

on CE (golias25.farm.particle.cz):

[root@golias25 root]# ping -c 1 lxn1189.cern.ch
PING lxn1189.cern.ch (137.138.152.218) from 147.231.25.25 : 56(84) bytes
of data.
64 bytes from lxn1189.cern.ch (137.138.152.218): icmp_seq=1 ttl=46
time=33.0 ms

--- lxn1189.cern.ch ping statistics ---
1 packets transmitted, 1 received, 0% loss, time 0ms
rtt min/avg/max/mdev = 33.094/33.094/33.094/0.000 ms
[root@golias25 root]# telnet lxn1189.cern.ch 2170
Trying 137.138.152.218...
[root@golias25 root]# ping -c 1 lxn1189.cern.ch
PING lxn1189.cern.ch (137.138.152.218) from 147.231.25.25 : 56(84) bytes
of data.
64 bytes from lxn1189.cern.ch (137.138.152.218): icmp_seq=1 ttl=46
time=33.0 ms

--- lxn1189.cern.ch ping statistics ---
1 packets transmitted, 1 received, 0% loss, time 0ms
rtt min/avg/max/mdev = 33.094/33.094/33.094/0.000 ms
[root@golias25 root]# telnet lxn1189.cern.ch 2170
Trying 137.138.152.218...

And here it hangs.

But when I ssh to some workernode, it works as charm:

[root@goliasx44 root]# ping -c 1 lxn1189.cern.ch
PING lxn1189.cern.ch (137.138.152.218) from 147.231.25.44 : 56(84) bytes
of data.
64 bytes from lxn1189.cern.ch (137.138.152.218): icmp_seq=1 ttl=46
time=32.5 ms

--- lxn1189.cern.ch ping statistics ---
1 packets transmitted, 1 received, 0% loss, time 0ms
rtt min/avg/max/mdev = 32.581/32.581/32.581/0.000 ms
[root@goliasx44 root]# telnet lxn1189.cern.ch 2170
Trying 137.138.152.218...
Connected to lxn1189.cern.ch.
Escape character is '^]'.

(tried from a few random worker nodes, everything worked).

The same situation is on our second PRAGUE farm (CE skurut17.cesnet.cz).

Strange situation, indded, isn't it?

--
Jiri Kosina
Institute of Physics, Academy of sciences of the Czech Republic