Hi,
Apologies for crossposting but I got found out.
In the last few weeks, and particularly between the 4th and 7th of November you
might have noticed increased activity on your websites. Requests, at the rate
of 1 every 10 secs would have come from somis4.ais.dundee.ac.uk.
We are testing a new search engine (Search Maestro 3) with the aim of indexing
all of *.ac.uk. In October/November we had many half runs as the robot kept
crashing. It eventually managed to make the distance a few days ago. That is,
3.8M docs have been downloaded and indexed. The set excludes identical
duplicates and URLs with questionmarks.
Search Maestro 3 will be revealed when the user interface is completed and some
features ;-) have been fixed. It is written in Java and it will be free to
download and use under open source or some other type of licence.
Could I please ask those web persons using betsie, and other similar tools which
generate text pages to put relevant exclusions in your robots.txt file. If you
care also exclude your stats pages and similar. (Too much noise ... and
reduction of speed).
Regards
Charles
PS. if you know of any robot that is capable of downloading (storing and
ideally comparing for duplication) more than 5M pages please let me know.
==============================================
Charles Christacopoulos, Management Information Officer,
Planning & Information, University of Dundee, Dundee, DD1 4HN,
Scotland, United Kingdom. Tel: 44(0)1382-344891. Fax: 44(0)1382-201604.
http://www.somis.dundee.ac.uk/
|