Dear All,
I had three suggestions about "scraping" web sites for links. Two
involve quite a lot of programming knowledge but look like they might be
tuned to give exactly what I want. These are at:
http://stackoverflow.com/questions/2804467/spider-a-website-and-return-urls-only
(using wget/Unix).
and
http://scrapy.org (scrapy/python).
Given my technical skills, it may take me a while to figure these out
and see if I can actually get them to do what I need (not just the
commands but installing and making all the relevant software work). The
other option is web based:
http://www.issuecrawler.net/
and I have already been able to sign up and start a free crawl so that
gets my vote for now. If I can't make it do what I want (and you have to
pay for crawls after 10 so it could get pricey to scrape lots of sites
this way) then I'll have to get stuck into the more nitty gritty
approaches.
Many thanks to: Annie Waldherr, David Sherlock and Giovanni Luca
Ciampaglia.
All the best,
Edmund
--
Edmund Chattoe-Brown (Department of Sociology, University of
Leicester, UK)
[log in to unmask]
--
http://www.fastmail.fm - The way an email service should be
|