I think that James Cummings missed my point by a mile. Perhaps I did not
word it to his understanding
My problem was two-fold:
1. How to store data electronically and permanently in a manner that will
allow future retrieval. Katrina took my rather large selection of 5 1/4"
floppies that I had not completed copying onto 3 1/2" floppies. What do you
know? My new PC does not have a 5 1/4 drive and when I replace my current
PC I am advised that few will offer 3 1/2" drives--just CDs...unless they
decide to offer only DVD drives. A friend of mine a while back was able to
purchase the complete works of Beethoven and several others of his ilk to
play on his tape recorder. I hope that he maintained his tape recorder
well. Right now I cannot get an answer when I start asking about how long
certain data recorders and players will be available other than "You will
just have to keep up with the times." That answer, to my mind, makes the
storing of petabytes and exabytes problematical--NOT because we cannot store
them right now but because I fear that we will not be able to retrieve the
data in the future.
2. Whom do we trust to maintain the storage? I use the SF librarian as
a bad example. I did not expect to have someone agreeing with the librarian
that Danielle Steele and Barbara Bradford should replace Jane Austen and the
Bronte sisters. There goes the library as a source of cultural history
and community cultural literacy.
Furthermore, I consider the size issue to be somewhat of a red herring:
I do not believe that more is always better nor do I know of any Classicist
or Medievalist who does. I was a Mensa group that later incorporated itself
as a museum of antiquities. The guys in charge had a very large collection
of 2000-year-old oil lamps in good condition. Several lamps in very good
condition were set aside for display at the future museum--the rest were for
sale because they added nothing. A note that they were common in that
period sufficed for everything else.
We need representative samples and we need diversity and both samples and
comments on the type, degree, and location of the diverse samples and of how
wide-spread was the diversity indicated. We cannot learn that much more
from 100,000 identical oil lamps from Jerusalem than we can from 10;
however, any and all fragments of clay seals and vellum or parchment might
well be of great value.
Scott Catledge
-----Original Message-----
From: The Digital Classicist List [mailto:[log in to unmask]]
On Behalf Of James Cummings
Sent: Thursday, October 15, 2009 5:27 AM
To: [log in to unmask]
Subject: Re: [DIGITALCLASSICIST] How much server space would the Classical
world occupy?
Hi all, [Long, apologies]
I think I'm still concerned about the notion of 'size' with
respect to textual resources. Ok, let's say the IADM database is
1gb, which these days is indeed a trivial amount of space. Is
that 1gb in the database's format? What about an SQL or XML
export? That would probably be bigger since one assumes the
binary format that the database uses achieves some optimization.
But what if I used a clever compression algorithm on the SQL
dump? That might make it significantly smaller. I suppose there
is nothing wrong with having a rough idea, especially to compare
it to another discipline where a similarly collected rough idea
is being used. And the various eResearch people will tell us
that they are dealing with data streams in the order of gigabytes
per second... somehow implying that means their data is somehow
more important or better. Let's not fall into the trap of
believing that more is better... how many shards of Samian Ware
are there that introduce no real new knowledge other than a sense
of scale? I think size might be a poor comparator for usefulness
in preservation of knowledge about our cultural heritage. While
a stream of astro-physics data may return a huge amount of data
in size, the number of factors being measured may be quite
limited, whereas a single textual resource may contain a huge
number of new pieces of information, corroborations of existing
knowledge, or contradicting and problematising details. I think
the complexity of the data is relevant to the perceived worth and
its clamouring for long-term preservation.
I disagree with Scott Catledge that storing Petabytes and
Exabytes of information will be that problematic. Such storage
(in redundant manners) is already possible should the desire for
preservation be enough to produce the funding to secure it. It
is the funding of centralised storage for humanities disciplines
which is unlikely in the medium term, not the ability to do it.
The librarian you mention is doing what librarians do, they have
a set amount of space and weed their collection to fit --
librarians are not (necessarily) archivists. Is the answer then
for each of us (where 'us' includes individuals, projects, and
institutions) to store and make available the data that is
perceived as important by us? Then the challenges are in the area
of interoperability and the exposure, authentication, and
integration of metadata and data in some file-sharing
cloud-computing data web wonderland where we each fund the
preservation of what we believe to be the 'important' data.
There is little technical challenge in doing this on a much more
significant scale than digital classicism has reached on the web,
the challenges are mostly human, political, and financial. We'd
only have ourselves to blame for the non-preservation of (and/or
failure to fully expose and properly license) data if it is not
available to successive generations, and in many ways that is
already the case. However, this introduces the same sampling
errors that you were mentioning but instead shifts the blame away
from some archivist and onto the community itself. I guess I'm
more of a 'preserve everything we can' sort of person rather than
a 'carefully weed out redundancies' which probably explains the
size of the network attached storage I have at home. :-(
Sorry for the long rambling post,
-James
Melissa Terras wrote:
> Just a follow up on how big the Silchester data set is, from Mike
> Rains at York Archaeological Trust
>
> The current size of the
>> Silchester IADB database is just over one gigabyte (1080mb),
approximately
>> 350,000 records split between Finds, Contexts, Photos, Plans, etc. To
this
>> you could add the large amount of stuff (high resolution images,
>> spreadsheets, Word documents, Illustrator files, etc) on the file server
in
>> Reading which hasn't reached the IADB yet. We're probably about two
thirds
>> of the way through Silchester, so that by the end of the project I
wouldn't
>> be surprised if the total goes over 2gb (or more - every year I think we
>> generate more digital data than in previous years). I would guess that
the
>> Insula IX excavation represents less than 1% of the total area of
>> Silchester.
>
> M
>
> 2009/10/14 Scott <[log in to unmask]>:
>> I believe that we need to prepare for exabytes of information; petabytes
>> will run out too quickly.
>>
>> The most crucial question should be how are we going to store this amount
>> of data--and whom can we trust to decide what to maintain and what to
>> discard. I heard a secondary school librarian brag that she had rid her
>> library of all the obsolete books by Hardy, Austen, and the Bronte
sisters
>> to make room for the great classics being written now that are pertinent
>> to our new society. I remarked to her, "Adolf Hitler and Stalin would be
>> proud of you." Before I am bombarded by angry librarians, they should
know
>> that I have only the greatest respect for librarians in general and the
ones
>> that I know now in particular. The librarian in question was from San
>> Francisco.
>>
>> Despite the alleged claim by some Classicists that they only need a small
>> sample on which they can build their theories, I have read too many
satires
>> on the USA in which extraterrestrial or 3rd or 4th millennium Terran
>> archeologists explore our civilization and try to recreate our times and
>> manners, usually with very credible answers that could hardly be more
>> inaccurate. A poll was done in 1957 on geographic knowledge of college
>> freshman by selecting one school in each state and interviewing the first
>> willing freshman. I was selected at MSC and answered all of the
questions
>> correctly and pointed that two of the questions had more than one answer
>> (capitals of The Netherlands and Bolivia)--I was very interested in
>> political geography from elementary school on. Mississippi was rated as
>> having the most knowledgeable students in the USA in the field of
>> geography--and I went to high school in FL. This poll, which was quickly
>> discounted for many reasons--not all valid--is an excellent example of
>> making wide statements on an invalid sample. Just how do any
historians--I
>> am more of a Medievalist than a Classicist--decide that their limited
>> samples are sufficient to make a conclusion. I hedge my bets by making
my
>> sample equal to my population (e.g., the listed names on a particular
codex)
>> or by generalizing (e.g., I wrote an article on the quadra nomina just to
>> show than agnomina and other such names were used in a reply to an
article
>> that stressed the tria nomina and ignored the existence of the other name
>> forms. That they existed was my point--not what percentage of the
>> population had them--that would be another paper.
>>
>> N. Scott Catledge, PhD/STD
>> Professor Emeritus
>> history & languages
>>
>> -----Original Message-----
>> From: The Digital Classicist List
[mailto:[log in to unmask]]
>> On Behalf Of Willard McCarty
>> Sent: Tuesday, October 13, 2009 10:15 AM
>> To: [log in to unmask]
>> Subject: Re: [DIGITALCLASSICIST] How much server space would the
Classical
>> world occupy?
>>
>> I'd guess that people here know about the genre to which this question
>> belongs, perhaps best exemplified by Michael Lesk's asking "how much
>> information is there in the world?" (Googling for his name and the
>> question will turn up some things which illustrate.) Lesk used to count
>> it in terabytes, but I suppose the figure has gone up somewhat, now that
>> we commonly have terabyte discs. It strikes me, however, that one should
>> also be asking what we would not have if all that can be stored on a
>> hard disc in whatever format were all that there is. What would happen
>> to the library if ALL that we had was the buildings and the books and
>> other resources in them?
>>
>> Yours,
>> WM
>>
>> Melissa Terras wrote:
>>> But you may also want to make the comment that Classicists are *used* to
>>> dealing with data loss, and extrapolating findings from the smallest
>>> scrap available. For example, pay packets and the Roman Army - someone
>>> out there will know better than me, but I remember reading somewhere a
>>> calculation of how many payslips would have been created (millions) and
>>> how many have survived (a handful) - yet we can understand a lot from
>>> the extant material.
>>>
>>> Additionally, its not good archival practice to keep everything... you
>>> have to make choices about what you will save, and what you will
discard!
>>>
>>> M
>>>
>>> Paradoxographer wrote:
>>>> Hello everyone, and thank you all for your contributions and help.
>>>>
>>>> To answer James' question about motivation ... I'm currently working
>>>> in research in the field of records and information management (though
>>>> a classicist by education and inclination, hence my continued
>>>> membership of this list). I am trying to get a feel for the volume of
>>>> material involved to inform a case I intend to argue in a paper /
>>>> article against the view - common in the records and archives field -
>>>> that we are entering a 'digital dark age' beacause of our current
>>>> inability to preserve more than a tiny fraction of born-digital
>>>> material. I know that the figures for current rates of information
>>>> creation are not exactly models of precision either, but they are
>>>> frequently bandied about in journals and conferences, and for my
>>>> purposes orders of magnitude will suffice.
>>>>
>>>> And I entirely agree that images, archaeological reports / records,
>>>> etc would have to be taken into consideration for any proper
>>>> assessment: the reason I provisionally excluded them was that I feared
>>>> it was too much like asking 'how long is a piece of string?' and did
>>>> not want to try the patience of the list with impossible questions!
>>>>
>>>> Kind Regards,
>>>>
>>>> Rachel Hardiman
>>>>
>> --
>> Willard McCarty, Professor of Humanities Computing,
>> King's College London: staff.cch.kcl.ac.uk/~wmccarty/
>>
>
>
>
|