Print

Print


The construction of a web archive begins with the definition of just what
the archive’s purpose should be. While some web archives have very specific
inclusion criteria and focus on very narrow topics for which there is a
known and limited universe of content to preserve, other web archiving
initiatives set out to simply archive what they can through a vast web of
sources and donors and without any overarching collection strategy or user
community to help guide them, in marked difference to the approach
typically taken in the library and archival communities to physical
collections.

Some of the largest collections of archived web content are thus
multi-petabyte datasets compiled over years or even decades through
criteria, seed lists, crawler designs and explicit and inadvertent design
decisions that have long ago been lost to time or which are considered
proprietary and cannot be shared.


http://bit.ly/2hWIoHK
http://bit.ly/2hWIoHK+


-- 
Peterk
Dallas, Tx
[log in to unmask]
Save our in-boxes! http://emailcharter.org
“If only there were a massive entity that I were forced to fund to tell me
how I should live my life, since I’m so obviously incapable of deciding for
myself.” M. Hashimoto

Contact the list owner for assistance at [log in to unmask]

For information about joining, leaving and suspending mail (eg during a holiday) see the list website at
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=archives-nra