Given the fact that there will also be virtual collections and that
collections may change in time, maybe the collection description rather
than the collection should have a globally unique collection id?
The usage scenario as I see it is the following:
1) we refer to collections in documents
2) we will have databases with collection descriptons.
In the first case we first need to find the collection description and
in there we need to find the services giving access to the collection.
This will probably be an URL to a web page or a base-url for the search
and retrieve service providing the descriptions of the objects in the
collection. For virtual collections I expect that these services will
need an extra parameter to define which part part of the collections'
object descriptions to search in.
It is desirable to make the identifier actionable. But what reason is
there to "click" on such an identifier? Probably you know already enough
about the corresponding collection to "click" on it's identifier with
the final intention to access its service. When making a document with
references to a collection why not provide immediately the address of
the service (with - if needed - an appropriate parameter indicating the
collection) or the address of a collection description containing the
address of the service. As long as it is clear what this address stands
for (e.g URI with encoding scheme) this is in my opinion in terms of
usability the best globally unique identifier.
>>> [log in to unmask] 6-8-04 15:52:29 >>>
(Stepping in with trepidation for the first time)
From my reading of this thread, it is clearly going to be very
difficult, and quite likely impracticable in the foreseeable future to
achieve a globally unique standard id for collections. Fundamentally
options seem to be either find/establish a registration agency, or
devise a failsafe method for anyone to generate the same id for a
collection (possibly incorporating existing ids such as ISIL or http
domain names). Problems with both of these have been amply explained.
So perhaps we should consider more fully why a unique id might be
needed. As a non-expert in creating or managing collection
descriptions,I can only go on what I've read so far.
Pete said 'The purpose of assigning an identifier to a collection is
that the collection can be referred to unambiguously.' But referred to
by whom or what, and why? The only answer I can see to these questions
so far is where Juha cites automated record exchange between
applications as the reason for needing globally unique identifiers. I
assume that behind this is a concern to avoid the duplication of
in metasearch system databases.
So, how important is this? If only a few duplicates are going to occur
then the world could probably live with them for the time being. But
they going to proliferate out of control as records get passed around
systems? Could records be de-duped using other data elements, or at
least flagged as potential duplicates? Are there any other reasons for
needing a globally unique collection id?
Information Standards Manager, Talis
[log in to unmask]
+44 (0)121 471 1179
+44 (0)776 974 0077 (mobile)
From: DCMI Collection Description Group
[mailto:[log in to unmask]] On Behalf Of Gordon Dunsire
Sent: 06 August 2004 13:01
To: [log in to unmask]
Subject: Re: Collection identifier and language codes
There is a definite requirement to allow non-owning agencies to assign
1. Collection description service provides may need to create CLDs for
operational purposes: often "collections of collections" (say a fixed
landscape in the information environment) or sub-collections (say the
sub-collection of material that is in the manuscripts catalogue).
2. Supra-organisations may need to create CLDs; for example the CURL
collection is the material that is described by COPAC (the CURL OPAC).
CURL doesn't own the collections.
3. Many organisations will simply not be in a position to create their
own global identifiers, and may not formally or explicitly delegate
responsibility to do so. There must be allowance for external bodies
define and identify whatever collections they need to. For example,
the government carries out an audit of all Roman materials held in
Scottish museums, it may define sub-collections ("Museum X Roman
collection") which are not recognised or required by the museum
What is being identified is a major issue. Dis-aggregation and
re-aggregation is necessary for service operations. Strictly
a collection of 3 items (ABC) can generate the 6 collections (ABC, AB,
AC, BC, A, B, C). Many will never need to be recorded, but a
significant number will. There are many issues surrounding the
relationships of collections to
sub- and super-collections, and I would not advise using "intelligent"
identifiers to store this information.
So I agree with Pete, that the global identifiers should be opaque
And any agency should be allowed to create an identifier for any
collection they wish to, whether they are connected in any way to it
Which means that there will, indeed, be several unique identifiers for
the same collection; identifying agencies should be encouraged to
existing identifiers. It might be possible to set-up a co-ordinating
body, but I don't see how it could be effective given the potential
combinatorial explosion illustrated in the ABC example above.
The http approach is attractive. Domain name stability is likely to be
less of a problem if identifying agencies are based on existing,
organisations. These may be service providers (e.g. SCONE can already
provide unique identifiers for many institutional collections in
Scotland) or supra-institutional bodies (the Scottish Library and
Information Council, Scottish Museums Council, and National Archives
Scotland, between them covering most of the SCONE collections; or MLA
for England and Wales).
Nonetheless, this approach will inevitably lead to "dead" identifiers
(no domain lasts forever), but multiple identifiers for the same
collection may alleviate this as a problem.
Centre for Digital Library Research
t: 0141 548 4680
f: 0141 552 5330
e: [log in to unmask]
From: DCMI Collection Description Group
[mailto:[log in to unmask]]On Behalf Of Pete Johnston
Sent: 05 August 2004 16:42
To: [log in to unmask]
Subject: Re: Collection identifier and language codes
> Your assumption is correct: I propose a simple mechanism for making
> local collection identifiers globally unique.
> Given that there is a fair chance that collection description
> records will be shared between metasearch applications via automated
> means (such as OAI-PMH) in the Internet, we must recommend strongly
> even require that libraries and other organisations describing their
> collections avoid local identifiers. And since there is no
> international standard collection number, and we will not have one in
> the next few years, we should have means of making any local
> identifier a unique one.
> If collection
> identifiers are built from prefix (ISIL) and suffix (local
> string) the resulting identifier is globally unique as long as the
> ISIL code is unique and the organisation does not re-use the
> identifier string.
> This technique of using a prefix in the identifier is similar to that
> utilised in DOI and URN in guaranteeing the uniqueness of
> I'm afraid that your assumption that this is a new identifier scheme
> is correct, but then what existing identifier would you use for
> identifying collections?
I suppose to make any proposals for identifier schemes we need to step
back and decide what the "functional requirements" for a collection
(I took a stab at this when I was writing the last message but then
didn't send it. And I'll qualify all of this by stressing that I'm
really not an expert in this area! I'm also borrowing freely from some
work that was done - especially by Andy - on sketching out the
requirements for identifiers for learning objects - though I'm not
assuming tht the requirements are the same for collections as for
The purpose of assigning an identifier to a collection is so that the
collection can be referred to unambiguously. I think collection
identifiers should be
- globally unique: an identifier should identify only one collection -
though a single collection may have multiple identifiers
- persistent: an identifier should continue to identify the same
collection through time (do we need to say for how long? roughly?).
Persistence of the identifier should be independent of changes to
such as the ownership of the collection, the location of the
or the services that provide access to the collection.
- easy/cheap to assign: an agent (I'm not sure whether it is always
the owner of a collection (or their delegate) who is assigning
identifiers or whether we may need to allow for other agencies to
them?) should be able to assign an identifier in a way which is
convenient for them, and does not require them to engage in
work (this could maybe extend to something about the desirability of
incorporating/qualifying existing local identifiers?)
- assignable in a devolved environment: multiple agents should be able
to assign identifiers independently of one another without the risk of
their using the same identifier for two different collections
- easy to cite: a human user should be able to cite a collection
identifier relatively easily - in both a digital and non-digital
- so they should be relatively easy to communicate verbally or in
I'd suggest it is desirable (but maybe not strictly necessary) that
collection identifiers are
- actionable: a user can expect to "resolve" a collection identifier
using a digital service, and so obtain access to a digital object that
provides some information about the identified collection
- easy/cheap for the user to action: so a user call to a resolution
service should make use of widely available desktop tools e.g. a Web
And the identifier scheme should be
- scaleable: the identifier scheme should support the identification
an unlimited number of collections
- based on standards
Is that a reasonable set of requirements? I'm sure there are others.
(See below for my thoughts on embedding intelligence/metadata in the
As for what schemes meet those requirements.... I guess I've come to
find arguments for the use of HTTP URIs quite persuasive. So e.g.
http://example.org/collections/ or http://collections.example.org/ (or
similar) followed by a "local" identifier string (with whatever
is required for URI/URL syntax rules).
The argument against this (or one of the arguments against it ;-)) is
that this relies on
(a) the current owner of the domain name maintaining ownership of that
(b) the owner of the domain name (or their delegates) managing URIs
within that domain effectively so as to guarantee persistence and
Another option would be the use of PURLs -
http://purl.org/exampleorg/collection/ (or similar) followed by the
local identifier string (with whatever encoding is required for
However, if we are really looking at persistence in terms where we are
thinking beyond the lifetime of the HTTP protocol, then I guess there
an argument that we should be looking at an identifier scheme that is
not associated with that protocol, as you suggest here:
> On the other hand, we might register a URN namespace for collection
> identifiers and avoid the problem with the new identifier schema (see
OK, but see my comment below.
> The ISIL standard does not state too explicitly who is entitled to
> an ISIL. This was probably intentional, and we should regard this
> opaqueness as a strength; I do not think that Leif Andresen would
> easily refuse any organisation with a valid need for an ISIL.
> I disagree that ISILs are not actionable. I (or Leif
> Andresen) could register a URN namespace for ISIL.
OK. I meant from what I read on the ISIL website there was nothing to
indicate that they were actionable. Yes, I agree that if there was a
corresponding URN namespace, they become potentially actionable.
> The thorny
> part would be to describe how to find ISIL resolution services. There
> will not be a global library database, but a national one will do,
> since ISIL gives a strong hint (country
> code) as regards where to look for the library data. You would need
> just one DNS record per country telling where to find the library
> information database. The same decentralised mechanism was defined
> accepted for ISBN-based URNs (see RFC 3187). ISIL-based collection
> identifiers would require an another URN namespace, but could in
> principle be resolved to collection metadata and/or location, if
> is a national collection metadata register, or - even better - the
> global union catalogue for this data. And using URNs would remove the
> need of defining a new schema.
OK.... but I do kind of worry that we are heading towards registering
new URN namespaces every time we want to identify a new resource type.
Do we really _need_ a URN namespace specifically for collection
identifiers? What does it give us that use of an existing scheme
appropriately managed use of PURLs) does not?
(I mean those as genuine questions that we should try to answer.)
> Most of us probably agree that collections must be identified. And
> there are probably also quite a few people around who think that in
> order to be truly useful these identifiers must be globally unique.
Yes, I agree ;-)
> Lacking a standard
> identifier for collections (maybe we should start developing such a
> standard ASAP) our best current opportunity is, I think, a
> of ISIL and local string. This might even work as the future
> The collection identifier should be "intelligent" in the sense that
> carries information about the organisation which owns the collection,
> and ISIL is the best and only standard we have for naming the
Why is it necessary that the identifier for a collection carries
information about its owner?
I'm happy with the notion that such a convention provides a means of
facilitating the _generation_ of the unique identifier, i.e. as a
scoping/disambiguation mechanism for a "local" identifier.
But I'm less enthusiastic about the idea that the identifier that
results is treated as carrying intelligence (i.e. metadata about the
ownership of the collection), rather than as an opaque string.
What happens when a collection changes ownership?
Does the new owner assign a new identifier based on their ISIL?
That's OK, we can manage a one-to-many relationship between resources
and identifiers (but it would be nice not to generate identifiers
Does the first identifier (based on the previous owner's ISIL)
to identify the collection? Or does the previous identifier cease to
identify anything? Is it then available for reuse by the first owner
identify a new collection?
Hopefully, the first identifier _does_ continue to identify the
collection, otherwise it was not persistent and all my references
that identifier break. (I don't mean they don't de-reference: I mean
owner has said they no longer identify what I thought they identified
and in the worst case has (re)used them to identify other
But if the first identifier does continue to identify the collection,
then the ISIL part of that identifier no longer signals the current
owner of the collection, so the "intelligence" aspect of the
If OTOH the identifier is treated as an opaque string, that problem
This e-mail has been scanned for all viruses by Star Internet. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
This message has been checked for all known viruses by Star Internet
through the MessageLabs Virus Control Centre. For further information
Any views or personal opinions expressed within this email may not be
those of Talis Information Ltd.
The content of this email message and any files that may be attached
are confidential, and for the usage of the intended recipient only. If
you are not the intended recipient, then please return this message to
the sender and delete it. Any use of this e-mail by an unauthorised
recipient is prohibited.