We are doing this in a very limited way on specific parts of the website - we just rolled our own solution using wget though. The archives we keep are mainly for admissions and the REF exercise and we didn't really want them to be public at the moment - hence we didn't consider external solutions. By using wget and some post crawl scripts we can control exactly what is archived - we didn't want to archive too much of the site but where some important link lead off into other areas we wanted to be able to scrape those in if needed.
I'd be interested in other opinions though, as I think this is a growing requirement and I'm not sure we have an ideal solution (though it does work well for our current needs).
I would ask the question about how much of the site needs archiving. Some areas are (I'm sure) far more important to archive than others and I'm not sure it's a good idea to capture everything for ever. Some sort of governance (what to archive, how long for, removal policy etc) would be useful but this isn't something we have in place yet.
Nick
......................................................................................
Dr. Nick Mattin
Services Development Manager
University of Cambridge,
Information Services,
Roger Needham Building,
7 J J Thomson Avenue,
Cambridge
CB3 0RB
Tel: +44 1223 766210
Email: [log in to unmask]
-----Original Message-----
From: Managing institutional Web services [mailto:[log in to unmask]] On Behalf Of John Greenaway
Sent: 24 September 2013 08:36
To: [log in to unmask]
Subject: How do you handle web archiving?
Hi,
We're embarking on a large web revamp programme, including moving to a new CMS. Given this seems reasonably common in HE - what have other institutions approaches to archiving old content been?
We don't want to keep a old CMS(es) going, so the options seem to be:
1. Burn it all.
2. Archive with a 3rd party service, such as http://www.archive-it.org. Out of these http://www.webarchive.org.uk/ukwa/ seems very promising, as JISC involved, and British Library etc should know a bit about archiving.
3. Archive in-house and run own Wayback Machine equivalent - either everything, or selected parts. Possibly using Internet Archive's Heritrix crawler and Wayback or Archivematica as an interface.
Anyone had experience of 2 or 3?
Cheers,
John
--
John Greenaway
Development Manager
University IT
Cardiff University
|