Print

Print


Last week I explored what precisely makes up the 20 year archive of the web held in the Internet Archive’s Wayback Machine. Several of those findings have spawned considerable discussion over the past week within the library and web archival communities about what it means to archive the web, how much documentation and metadata is enough, the tradeoffs in completeness vs reach, and how to better engage with the myriad constituencies served by web archives.

Why is it so important to understand what’s in our web archives? Perhaps the most important reason is that as an infinite and ever-changing landscape, it is simply impossible to archive the “entire internet” and perfectly preserve every change to every page in existence. Web archives are by their very nature an imperfect record of the web and constructing them is an exercise in countless tradeoffs of how to preserve an infinite stream with finite resources.


http://onforb.es/1QMw9XQ
http://onforb.es/1QMw9XQ+




--
Peterk
Dallas, Tx
[log in to unmask]
Save our in-boxes! http://emailcharter.org
"The problems of our economy have occurred not as an outgrowth of laissez-faire, unbridled competition. 
They have occurred under the guidance of federal agencies, and under the umbrella of federal regulations."
Senator Ted Kennedy, in defending trucking deregulation in 1978.
Contact the list owner for assistance at [log in to unmask]

For information about joining, leaving and suspending mail (eg during a holiday) see the list website at https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=archives-nra