On 06/07/2012 09:25 AM, alan buxey wrote:
>
> we definately need more feedback from google - eg client ipv4/ipv6 address
> rather than just the DNS...
If you have URL logging on your default route, you can see the checks as
they pass by yourself.
The checks basically work as follows:
Client searches for something on google
If check has been run recently (determined in an unknown fashion), stop
Else
Client generates a unique "blob"
HTTP GET to <blob>.i1.v4.ipv6-exp.l.google.com
HTTP GET to <blob>.i2.ds.ipv6-exp.l.google.com
30 second pause
assuming browser is still open
HTTP GET to <blob>.s1.v4.ipv6-exp.l.google.com
A "fail" seems to be the absence of the "ds" (dualstack) checkin.
Absence of the final "s1" checkin is irrelevant - that just seems to
"call home" with some stats in the URL arguments.
I've been writing a script locally that parses the output of our URL
logs, collating the checkins by blob, and I do see some "brokenness" by
that criteria.
However, investigation of the individual cases (and we're talking about
10-15/hour) doesn't show any root cause. Some failures are on IPv4-only
machines (e.g. WinXP without IPv6 installed) with only "A" queries being
made for the unique names in DNS.
Other failures seem to be connectivity to the checkin node - I've got
netflow locally that suggests 4x TCP SYN packets left our site, but no
answer came back.
And today, Google tell me our brokenness rate has dropped off, despite
us having done nothing (unless changing the source IP of our resolvers
magically fixed it).
I remain unconvinced that the check is working.
All of which makes me wonder: *if* the Google brokenness check is
mis-reporting, then maybe all the other checks we've seen talked about
are as well. It's a big "if" - but what if IPv6 brokenness is far, far
lower than we really think? What if the browser-based checks are
reporting a whole suite of failures (e.g. slow-ish DNS lookups, bad TCP
stack behaviour) only some of which are IPv6-related?
Hmm.
|