On Thu, 8 Apr 1999, Jon Knight wrote:
> On Thu, 8 Apr 1999, A.Dawson wrote:
> > Does anyone know how to pick up such changes via an automated
> > link checking process?
>
> Get your link checker to make an MD5 checksum of the page each time it
> pulls it and then compare the result with previous results. This will
> tell you when any of the HTML is changed. Of course sites that include
> HTML that is generated on the fly will always appear to be changing but if
> you point to mainly static web pages it'll give you a reasonable first
> line filter.
Too much noise? Why not simply get your link checker to look for the word
'sex' in each page it pulls, do a manual check on any hits and maintain a
list of known exemptions. (And do your link checking via the JWCS to keep
your first level links in the cache).
Andy.
--
UK Office for Library and Information Networking
University of Bath, Bath, BA2 7AY, UK Voice: +44 1225 323933
http://www.ukoln.ac.uk/ukoln/staff/a.powell/ Fax: +44 1225 826838
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|