Last week I explored what precisely makes up the 20 year archive of the web held in the Internet Archive’s Wayback Machine. Several of those findings have spawned considerable discussion over the past week within the library and web archival communities about what it means to archive the web, how much documentation and metadata is enough, the tradeoffs in completeness vs reach, and how to better engage with the myriad constituencies served by web archives.

Why is it so important to understand what’s in our web archives? Perhaps the most important reason is that as an infinite and ever-changing landscape, it is simply impossible to archive the “entire internet” and perfectly preserve every change to every page in existence. Web archives are by their very nature an imperfect record of the web and constructing them is an exercise in countless tradeoffs of how to preserve an infinite stream with finite resources.

Dallas, Tx
[log in to unmask]
Save our in-boxes!
"The problems of our economy have occurred not as an outgrowth of laissez-faire, unbridled competition. 
They have occurred under the guidance of federal agencies, and under the umbrella of federal regulations."
Senator Ted Kennedy, in defending trucking deregulation in 1978.
To view the list archives go to: To unsubscribe from this list, send an email to [log in to unmask] with the words UNSUBSCRIBE RECORDS-MANAGEMENT-UK For any technical queries re JISC please email [log in to unmask] For any content based queries, please email [log in to unmask]