JISCMail - CETIS-METADATA Archives

Email discussion lists for the UK Education and Research communities
Subscriber's Corner
Email Lists
CETIS-METADATA Archives

CETIS-METADATA@JISCMAIL.AC.UK

View:

Message:
[
First
Last
]
By Topic:
[
First
Last
]
By Author:
[
First
Last
]
Font:
Proportional Font
		LISTSERV Archives
		CETIS-METADATA Home
		CETIS-METADATA October 2004
Options

Subscribe or Unsubscribe
Get Password
Subject:
Re: Identifiers again (a global, persistent but non-unique debate)
From:
Rob Wilson <[log in to unmask]>
Reply-To:
Rob Wilson <[log in to unmask]>
Date:
Sun, 3 Oct 2004 21:01:51 +0100
Content-Type:
text/plain
Parts/Attachments:
text/plain (820 lines)
Folks

given that this is good and productive activity

Thanks to Dan Rehak, Andy Powell, Mike Collet, Kerry Blinco, Larry Lannom
et al.

I going to post this to the list DCMI PI and also to the CETIS Metadata
list

essentially what I see emerging is that :

1.  we agree that we consistently ouput to knwn URI syntax so that parsing
is achieved

2. that we are not happy or convinced that we have to obtain "
registered " URI status before  such are useful  : please see Kerry B'
succint comments here.

quote " > > The insistence that one can only express an
identifier/locator in one  way also frustrates me.  I should be able to
choose handle as my identification systems and resolution protocol of
choice, > express this  in the appropriate encoding for those who do have
the appropriate plug  in but be free to also express this as a less robust
http URI to the currently preferred client proxy for the others.

3. that we can give guidance and specific outline on how the varoius
schemes will encode
see Dan Reahsak commenst adn work @ the CORDRA site

4. the other issues : re browsers / look up  : I sense that you cannot be
absolute here  ,  find reason to enhance not deny usefulness of features /
benefits

And this can be for LOM / Resource purpose : terms, metadatA elements ,
records  resources etc and it should not preclude the use of ranges of ID
schemes : ARK, PURL . DOI , HDL  etc ...

Robin

CO Chair DCMI PI

Robin Wilson
TSO > London
-----Original Message-----
From: Daniel R. Rehak [mailto:[log in to unmask]]
Sent: Thu 30.9.04 19:36
To: 'Larry Lannom'
Cc: 'Kerry Blinco'; Wilson, Robin
Subject: RE: CETIS Metadata


Well at least for CORDRA, its time to put a stake in the ground.
I started the process of defining the "official" CORDRA document set.
http://www.cordra.lsal.cmu.edu/cordra/docs/

Next up:  CORDRA IDs as a profile of handle and URI and HTTP encoding of
CORDRA IDs.

Larry: do you have any specific thoughts on this.  hdl:<na>/<localname>
versus hdl://<na>/<ln>

For CORDRA system ids, the NA and LN will have some more restrictions.
CORDRA system handles will only be from UTF-8, so there will be no extra
URI
encoding from UCS-2.  The HTTP encoding will be the standard HTTP URI
encoding.  The ID will be the standard DCE GUID algorithm, and will be case
insensitive.  The http form will be http://<cordrresolverdnsname>/<na><ln>

Handles assigned to content by some other NA may follow different rules,
but
for how I'll defer describing how to do that.  I don't know what the rest
of
the ADL Pilot is doing to generate their IDs, but we are using GUIDs.

Anything else critical to make this real?

Rob -- any thoughts relative to your infrastructure.

        - Dan

> -----Original Message-----
> From: Larry Lannom [mailto:[log in to unmask]]
> Sent: Wednesday, September 29, 2004 03:29 PM
> To: Daniel Rehak
> Cc: Kerry Blinco; 'Wilson, Robin'
> Subject: Re: CETIS Metadata
>
> Dan,
>
> Having thought about all of this now for a bit, I find I do
> have some relevant comments. Given those, I do hope you
> respond to one of the mailing lists as I think these are good
> arguments but I don't want to steal and parrot them
>
> Handle is a URI, just not registered -- well, yes and no. I
> haven't taken the time to study the new URI spec, and there
> are probably encoding parts that I would need time to
> understand, but sticking to the prescribed URI syntax would
> greatly constrict the handle spec, which is quite happy with
> kanji or big5 or any of the other encodings.
> How to indicate the encoding is an issue -- perhaps taken up
> by the IRI spec?
>
> I suppose we could put out some definitive documents on how
> to encode handles in http URIs as a start and then try a URI
> spec and then an IRI spec? I'm tired already.
>
> Not knowing what comes back from http -- the INFO uri isn't
> really an identifier but it does speak a little more directly
> to identifier issues, so at least it would give a place to
> talk about what came back.
> (If you don't know it, it is a syntax for 'fat' urls allowing
> metadata and service requests to be packed into a URL.
> Everything, or most things anyway, can be there by value or
> by reference and the by reference stuff will rely on
> identifiers. There is then an accompanying registry which
> tells you about the identifiers.)
>
> Larry
>
> On Sep 29, 2004, at 10:06 AM, Kerry Blinco wrote:
>
> > Dan
> >
> > thank you for this, I agree this is a good perspective.
> >
> > I feel very strongly (and feel like a broken record)  that
> the issues
> > of identification,  resolution, locators,
> expression/encoding and the
> > protocols needs to be completely unpacked.   The use of the term
> > persistent identifier as short hand for actionable, resolvable,
> > persistent over time etc etc has done a great disservice to our
> > ability to deal with these sorts of assertions.
> >
> > The insistence that one can only express an
> identifier/locator in one
> > way also frustrates me.  I should be able to choose handle as my
> > identification systems and resolution protocol of choice,
> express this
> > in the appropriate encoding for those who do have the
> appropriate plug
> > in but be free to also express this as a less robust http
> URI to the
> > currently preferred client proxy for the others.
> >
> > Kerry
> >
> > At 12:00 AM 29/09/2004, Larry Lannom wrote:
> >> I agree and like the perspective. What comes back from a handle
> >> resolution is precisely defined. What is in the handle record and
> >> what the client does with that data is not tightly
> defined, but that
> >> is a separable issue.
> >>
> >> I'll also try to send a response to some of the earlier
> email, where
> >> I might be able to add some useful perspective.
> >>
> >> Larry
> >>
> >> On Sep 27, 2004, at 9:54 AM, Daniel Rehak wrote:
> >>
> >>> A counter argument.  Someone needs to do a scrub for accuracy.
> >>>     - Dan
> >>> -------
> >>>
> >>> Andy's argument is that there are 4 requirements
> >>> 1) an identifier must be "encoded" in a URI scheme
> >>> 2) the URI scheme must be IANA registered
> >>> 3) the identifier resolution must be coupled with the URI scheme
> >>> 4) the URI scheme must be widely deployed.
> >>>
> >>> He concludes that the only URI scheme that meets these
> requirements
> >>> in the
> >>> http URI scheme.
> >>> We can concede that (1), (2) and (4) are valid for the http URI
> >>> scheme and
> >>> that (1) and (4) are good criteria in general and (2) is
> desirable.
> >>>
> >>> Just for fun, let's add another requirement
> >>>
> >>> 5) the resolution process must be well defined and it must be
> >>> possible to
> >>> properly implement it
> >>> (so that you get the required uniqueness, consistency,
> correctness,
> >>> performance, etc.)
> >>>
> >>> Later on, Andy gets to (3) -- the coupling of ID
> resolution with the
> >>> URI
> >>> scheme.
> >>> He disagrees with Mike separation here, from a "practical" sense.
> >>>> From a modeling sense, separation of identification and
> resolution
> >>>> are
> >>> critical.  It's more than a theoretical argument.
> >>>
> >>> Let's explore http -- the protocol.
> >>>
> >>> The ID proposal (abstracted) is:
> >>> URI:<scheme><resolver><id>
> >>> E.g., http://<someresolver>/<id> or http://tsoid/xyz
> >>>
> >>> The URI scheme is NOT the http protocol.  Andy's model of
> coupling
> >>> is to
> >>> encode the ID in the http URI and then passing the URI
> over the http
> >>> protocol as the messaging layer of the internet.  This
> coupling (3)
> >>> causes
> >>> problems.
> >>> So http://<someresolver>/<id> becomes
> >>>
> >>> On a transport layer, perform DNS resolution to
> <someresolver> and
> >>> sends the
> >>> message:
> >>> GET <id> HTTP/<version>
> >>> <header>
> >>> <header>
> >>> ...
> >>>
> >>> So what happens when we send this http message out for resolution?
> >>> What do
> >>> we get back?
> >>>
> >>> Well it depends on lots of stuff.  Are there proxies, distributed
> >>> DNS,
> >>> caches, redirect, headers, responses, etc.  What are the
> strategies
> >>> of the
> >>> web client?  What response headers are coming from the servers?
> >>> Where will
> >>> DNS really resolve to?  What caches will be used to resolve the
> >>> request?
> >>> What are the policies of the caches? When will there be a
> redirect or
> >>> content negotiation?  Given the wide spread deployment of edge
> >>> servers,
> >>> proxies, caches, etc., there is no guarantee that a simple http
> >>> protocol
> >>> request will be resolved correctly and consistently.
> There simply
> >>> is NO
> >>> "resolution" protocol in http.  It is not an application
> protocol for
> >>> resolution, i.e., no way to specify in http how I want
> "resolution"
> >>> (versus)
> >>> "get" to be performed and how I want to specify how to deal with
> >>> proxies,
> >>> redirects, etc.
> >>>
> >>> Unfortunately, any http based ID resolution process
> >>> (http://hdl.handle.net,
> >>> > http://tsoid.com/) will be subject to the whims of the http
> >>> protocol and its
> >>> wide spread deployment.  While they work reasonably will,
> there are
> >>> no
> >>> guarantees.
> >>>
> >>> So it comes back to the critical additional issue (5) -- the
> >>> requirement of
> >>> a "resolution" protocol that is well defined.  Using any
> of the IANA
> >>> registered URI schemes seems to violate (5).  They are
> not defined
> >>> for ID
> >>> resolution.  They do not provide a well defined model to layer ID
> >>> resolution
> >>> application on top of the defined application or
> messaging protocol.
> >>>
> >>> So what to do?  We could "redefine" one of the existing
> application
> >>> protocols, e.g., http, to define how it is to work when
> the request
> >>> type is
> >>> "resolve".  But now we are back to trying to satisfy (4)
> for the new
> >>> protocol and its probably not easy to change http.
> >>>
> >>> We could define a new "resolution" protocol but the
> mapping of URIs
> >>> to
> >>> application protocols is pretty much 1 to 1, so that would require
> >>> satisfying both (4) and (1).
> >>>
> >>> So let's restate and slightly rework the criteria.
> >>>
> >>> 1) an identifier must be "encoded" in a URI scheme
> >>> 2) the URI scheme must be IANA registered
> >>> 3) the identifier resolution protocol must be specified
> >>> 3a) the resolution protocol must coupled with the URI
> encoding such
> >>> that it
> >>> works properly with the URI encoding
> >>> 4) the URI scheme must be widely deployed
> >>> 4a) the resolution protocol must be easily deployable on
> top of the
> >>> existing
> >>> IP infrastructure of the internet.
> >>>
> >>> The solution is the empty set.
> >>> What requirement do we relax?
> >>>
> >>> Where does handle fit.  It seems to satisfy all but (2).  It's a
> >>> URI, there
> >>> are RFC to define it, it is deployed and the protocol has
> been shown
> >>> to run
> >>> over top of the existing internet IP infrastructure so it
> does not
> >>> require
> >>> changes to achieve the required functionality of the resolution
> >>> protocol.
> >>>
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Wilson, Robin [mailto:[log in to unmask]]
> >>>> Sent: Sunday, September 26, 2004 07:25 AM
> >>>> To: Wilson, Robin; [log in to unmask]
> >>>> Cc: [log in to unmask]
> >>>> Subject: RE: CETIS Metadata
> >>>> Importance: High
> >>>>
> >>>> or even repudiation .....
> >>>>
> >>>>         -----Original Message-----
> >>>>         From: Wilson, Robin
> >>>>         Sent: Sun 26.9.04 11:59
> >>>>         To: Wilson, Robin; [log in to unmask]
> >>>>         Cc: [log in to unmask]
> >>>>         Subject: RE: CETIS Metadata
> >>>>
> >>>>
> >>>>         so this is it
> >>>>
> >>>>         comment and pedudiation  ....  on the bit in red
> >>>>
> >>>>
> >>>>         So to try and sum up my views in one sentence... any
> >>>> chosen identifier
> >>>>         scheme *must* be a valid URI (to provide syntactic
> >>>> integration with other
> >>>>         Internet standards),
> >>>>
> >>>>         the URI scheme *must* be registered
> >>>>         (to ensure the
> >>>>         uniqueness of identifiers within that scheme)
> >>>>
> >>>>         nonensense
> >>>>
> >>>>         and support for the scheme
> >>>>         *must* be widely deployed within the software
> >>>> infrastructure that makes up
> >>>>         the Internet (to ensure that resolution is transparent
> >>>> to the end-user).
> >>>>
> >>>>                 -----Original Message-----
> >>>>                 From: Wilson, Robin
> >>>>                 Sent: Sun 26.9.04 10:02
> >>>>                 To: [log in to unmask]
> >>>>                 Cc: [log in to unmask]
> >>>>                 Subject: CETIS Metadata
> >>>>
> >>>>
> >>>>                 larry / dan
> >>>>
> >>>>                 see this text below re stuff happening on the
> >>>> JISC  / CETIS metadata space
> >>>>
> >>>>                 [log in to unmask]
> >>>>
> >>>>                 HDL URI has broken out of the stables and we
> >>>> seem to have some momentum which is difficult  to judge  ......
> >>>>
> >>>>                 can you have a looksee and make light comment  ?
> >>>>
> >>>>                 Rob
> >>>>
> >>>>
> >>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>> Date:         Sun, 26 Sep 2004 09:01:24 +0100
> >>>> Reply-To:     Andy Powell <[log in to unmask]>
> >>>> Sender:       The CETIS Metadata Special Interest Group
> >>>>               <[log in to unmask]>
> >>>> From:         Andy Powell <[log in to unmask]>
> >>>> Subject:      Re: Identifiers again (a global, persistent but
> >>>> non-unique debate)
> >>>> Comments: To: Mike Collett <[log in to unmask]>
> >>>> In-Reply-To:  <[log in to unmask]>
> >>>> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
> >>>>
> >>>>
> >>>> On Thu, 23 Sep 2004, Mike Collett wrote:
> >>>>
> >>>>> Here is a contribution to the identifier debate most of
> >>>> which I have
> >>>>> recently used elsewhere that may be of use in this SIG.
> >>>>
> >>>> Mike,
> >>>> thanks... this is helpful.  I disagree with some (perhaps
> >>>> most) of the things you say! :-) ...but thinking thru my
> >>>> response has helped me to understand where our differences
> >>>> lie (I think).
> >>>>
> >>>> What follows is a long and fairly detailed response to the
> >>>> various points you raise in your message.  For readers who
> >>>> have an interest, but who can't be bothered with the detail,
> >>>> the executive summary goes something like this...
> >>>>
> >>>> I think that the key area where we fundamentally disagree has
> >>>> to do with the balance between 'identification' and
> >>>> 'resolution'.  I think that seamless resolution of
> >>>> identifiers by *all* currently deployed Internet/Web software
> >>>> components (browsers, caches, proxies, Web servers, email
> >>>> clients, local services, etc.) is critical to the success of
> >>>> any chosen identifier scheme.  Mike seems to suggest that a
> >>>> clear separation of 'identification' from 'resolution' (to
> >>>> the point that seamless resolution by currently deployed
> >>>> technology isn't necessary?) is more important. I tend to see
> >>>> Mike's viewpoint as being the theoretically correct one -
> >>>> i.e. it should be possible to clearly separate these issues,
> >>>> and being able to do so would lead to a better identification
> >>>> system overall.  But my view is that, in practice, whatever
> >>>> kinds of identifiers we deploy people and software will
> >>>> want/need to be able to resolve them (by which I mean that
> >>>> people and software will expect to be able to retrieve a
> >>>> 'representation' of the resource being identified) - and that
> >>>> means that all the currently deployed technology needs to
> >>>> understand how the resolution mechanism works.  If people
> >>>> can't resolve the identifiers in a completely transparent
> >>>> way, then the chosen identifier scheme will fail because it
> >>>> will not 'work' as people expect it to work.  The only
> >>>> existing identifier that works in anything close to a
> >>>> transparent way (by which I mean that almost *all* software
> >>>> components on the Internet understand how the resolution
> >>>> mechanism works) is the 'http' URI.  So either we use 'http'
> >>>> URIs, or we have to get very widespread uptake of any new
> >>>> identifier scheme by the software infrastructure that makes
> >>>> up the Internet.  At the moment, the proportion of deployed
> >>>> software (browsers,
> >>>> etc.) that understand Handles or DOIs or URNs or whatever is
> >>>> very, very small and this means (IMHO) that they haven't
> >>>> succeeded (yet).
> >>>> Furthermore, I see very little evidence that these
> >>>> alternative URI schemes will ever be widely deployed.  (Note:
> >>>> I'd like to be wrong about this final point).
> >>>>
> >>>> So to try and sum up my views in one sentence... any chosen
> >>>> identifier scheme *must* be a valid URI (to provide syntactic
> >>>> integration with other Internet standards), the URI scheme
> >>>> *must* be registered (to ensure the uniqueness of identifiers
> >>>> within that scheme) and support for the scheme
> >>>> *must* be widely deployed within the software infrastructure
> >>>> that makes up the Internet (to ensure that resolution is
> >>>> transparent to the end-user).
> >>>>
> >>>> There are only a few candidate URI schemes that meet these
> >>>> requirements currently (http, ftp, ...), of which the 'http'
> >>>> URI seems the sensible choice.  This choice would change if,
> >>>> and only if, alternative URI schemes like 'hdl' become
> >>>> registered URI schemes and become supported by the bulk of
> >>>> the software infrastructure that makes up the Internet
> >>>> (browsers, caches, proxies, Web servers, email clients, local
> >>>> systems, etc.).
> >>>>
> >>>>> With regard to identifiers I think we have to differentiate
> >>>> between at
> >>>>> least
> >>>>> 3 types of persistence (and uniqueness) that can get muddled up.
> >>>>>
> >>>>> 1. persistence of the resource (concept, event or ephemeral
> >>>> thing) 2.
> >>>>> persistence of the identifier - resource relationship
> (this may be
> >>>>> many to 1 but should never be x to many) 3. persistence of the
> >>>>> resolvability of the identifier to something else (the resource,
> >>>>> metadata or related information)
> >>>>>
> >>>>> 1. Everyone seems happy with the idea that the resource may
> >>>> die long
> >>>>> before the identifier-resource relationship dies.
> >>>>
> >>>> Agreed.
> >>>>
> >>>>> 2. Everyone seems happy with the idea that it is
> important that the
> >>>>> identifier is globally unique and only ever associated with
> >>>> a single thing.
> >>>>
> >>>> Yup.
> >>>>
> >>>>> This can be split into the two separate bits:
> >>>>>   a.  namespace governance. There seem to be lots of valid
> >>>> candidates
> >>>>> and some people have their favourite pet namespace.
> >>>> Expressing it in
> >>>>> URI syntax may be helpful. Whether it is IANA registered or
> >>>> not may be
> >>>>> important to some, but if you know it is a Handle for
> >>>> example you have
> >>>>> some faith that it is unique.
> >>>>
> >>>> Well OK... I'll go with that for now... but (to take Handles
> >>>> as a specific
> >>>> example) I'd just like to flag up that I don't understand how
> >>>> I'm supposed to "know" that any given identifier is a Handle
> >>>> unless some syntax tells me that it is.  Perhaps more
> >>>> importantly, I'm not sure that "some faith"
> >>>> is really good enough to ensure long term persitence.
> >>>>
> >>>> And in the context of Internet-delivered services, which is
> >>>> what I assume we are interested in on this list, conformance
> >>>> with the URI syntax is much more than just 'helpful' - it is
> >>>> essential... because URIs are really the
> >>>> *only* globally usable form of identifier on the Internet??
> >>>>
> >>>>> A possible but unlikely problem is that hdl may be used by
> >>>> others in
> >>>>> identifiers expressed as uri syntax. Between most
> >>>> communities, and any
> >>>>> that follow the UKOLN advice,  hdl will be taken as
> Handle. If this
> >>>>> becomes a real issue two possible solutions are that
> another IANA
> >>>>> namespace is used if uri is essential and that hdl gets IANA
> >>>>> registration - which seems just a matter of time?.
> >>>>
> >>>> Yes possibly... though registration of the 'hdl' URI scheme
> >>>> has probably seemed like 'just a matter of time' for a long
> >>>> time now!?? :-)
> >>>>
> >>>>>   b. governance of the relationship - this is not so easy
> >>>> without some
> >>>>> kind of authority organisation or agreement between
> >>>> organisations. The
> >>>>> tendency is that the relationship is controlled by the
> >>>>> publisher/creator of the identifier. The persistence of this
> >>>>> relationship is as strong or as weak as the creator of
> the ID makes
> >>>>> it. In the UK most people would have some faith in for
> >>>> example JISC,
> >>>>> Becta, e-GU or their successors to maintain the relative
> >>>> persistence
> >>>>> of these relationships - even if the organisations change
> >>>> their (domain) names or disappear as organisations.
> >>>>
> >>>> I may well be missing your point here... but how can JISC
> >>>> maintain the identifier-resource relationship if JISC no
> >>>> longer exists?  It can't can it?  So something else has to
> >>>> take that maintenance on?
> >>>>
> >>>>> So far the use of
> >>>>> URLs has not been reliable as people often change the
> content at a
> >>>>> given location. By building in the domain name (such as
> >>>> tsoid.org.uk)
> >>>>> into the identifier arguably weakens the chance of persistence.
> >>>>
> >>>> Yes agreed.  But this is a feature of the way in which people
> >>>> have chosen to use 'http' URIs - it is not an inherent
> >>>> problem with them.  I have argued in the past, and will
> >>>> continue to argue, that people and organisations that are bad
> >>>> at maintaining the identifier/resource relationship when
> >>>> using http URIs are likely to be just as bad at maintaining
> >>>> the relationship with any other kind of identifier.  The
> >>>> problem is a policy/cultural one, not a technological one.
> >>>> Changing the technology doesn't remove the problem - though I
> >>>> would agree that an additional level of indirection always
> >>>> helps! :-) - it just moves it elsewhere.
> >>>>
> >>>> Note that PURLs are a mechanism for adding a level of
> >>>> indirection without needing to move away from using 'http' URIs.
> >>>>
> >>>>> 3. The persistence of the resolution is a very separate issue!
> >>>>> It seems that it is often mixed up with other forms of
> >>>> persistence. In
> >>>>> a similar way that people have regularly mixed up
> >>>> identification and
> >>>>> location at the implementation stage. It is also seems
> to be often
> >>>>> assumed or expected that:
> >>>>>     a. the resolution capability can or must be built into
> >>>> the (URI?)
> >>>>> expression of the identifier
> >>>>>     b. there will only be a single resolution of the identifier
> >>>>>
> >>>>> I think these assumptions are both false but others may
> disagree??
> >>>>
> >>>> Well, I certainly agree that there is confusion in these
> areas. :-)
> >>>>
> >>>> I also agree that b) is a false assumption, though I suspect
> >>>> that we mean different things by saying it is false.
> >>>> Resolution (to me) means resolving the identifier into one or
> >>>> more 'representations' of the resource being identified.
> >>>> There may well be other services based on the identifier
> >>>> (e.g. a DRM service) but they aren't 'resolution' services -
> >>>> they're value-added services built around the identifier.
> >>>> So, saying that identifiers need multiple resultion means
> >>>> that they should be able to resolve to multiple
> >>>> 'representations' of the resource, *not* that the resolution
> >>>> service has to offer multiple kinds of value-added services.
> >>>> Most Web servers already support multiple resolution of
> >>>> 'http' URIs thru content negotiation and the like.
> >>>>
> >>>> I think that a) is a mis-representation of the problem.
> >>>> Resolution isn't 'built into' the identifier as such - a
> >>>> protocol of some kind is required to perform the resolution.
> >>>> (In the case of 'http' URIs, HTTP is the protocol and an HTTP
> >>>> GET request is how resolution is performed). The point is
> >>>> that the identifier has to function within a technical
> >>>> environment (the Internet) which is very widely deployed.  So
> >>>> one of the major arguments used against the registration of
> >>>> new URI schemes is that that they incur a very large
> >>>> implementation cost (world-wide) because of the huge amount
> >>>> of existing software already out there that needs to be
> >>>> modified to support the new scheme.  When weighed against the
> >>>> minimal costs of re-using existing URI schemes, only new
> >>>> schemes which demonstrate benefits that outweigh that cost
> >>>> are likely to be endorsed.  So, for 'hdl'
> >>>> URIs to be deployed fully, support for that URI scheme would
> >>>> have to be built into the already deloyed base of software
> >>>> components (browsers, caches, proxies, local services (to use
> >>>> your phrase below), etc.).  This is not the case currently.
> >>>>
> >>>> Now, I think that you argue below that software doesn't have
> >>>> to be modified to deal with new URI schemes (like 'hdl')
> >>>> because, somehow, the human end-users of these identifiers
> >>>> will know how to recognise them and will know where to go to
> >>>> resolve them.  I.e. I will somehow know that when I'm shown
> >>>> "10.1790/712276811646" I should change it into
> >>>> "http://hdl.handle.net/10.1790/712276811646"
> >>>> <http://hdl.handle.net/10.1790/712276811646> in order to
> >>>> resolve it?  But how am I supposed to know that?  What tiny
> >>>> proportion of Internet users currently know that?  And in any
> >>>> case, why would I ever want to show my users a string of
> >>>> numbers like that?
> >>>>
> >>>> The point is that identifiers should be as transparent as
> >>>> possible to end-users and for that to happen *all* the
> >>>> software components that are used in connection with those
> >>>> identifiers (browsers, caches, proxies, local services, etc.)
> >>>> have to have knowledge built into them about how to deal with
> >>>> the identifiers.  Until that happens, the end-users
> >>>> experience of the identifier will be far from transparent.
> >>>>
> >>>> So, as a concrete example, DOIs are a good example of a
> >>>> non-transparent identifier because they do not work
> >>>> seamlessly in the majoritory of currently deployed software
> >>>> components.  They do work well in closed-world
> >>>> implementations like CrossRef, but only because knowledge of
> >>>> the DOI has been built into that particular application.
> >>>>
> >>>> Currently, in the context of browsing the Web, a 'doi' URI is
> >>>> only transparent if I'm using IE and I have the DOI plug-in
> >>>> installed.  If I use any other browser, or if I don't have
> >>>> the plug-in installed, then it isn't transparent because I
> >>>> have to take some manual action in order to deal with the DOI.
> >>>>
> >>>>> For example a user/local system may wish to check for
> >>>> resolution of id
> >>>>> xxx via a number of preferred services e.g. in the order
> >>>>> http://www.local.org.uk/mydept/xxx
> >>>>> <http://www.local.org.uk/mydept/xxx>
> >>>>> http://www.bath.ac.uk/xxx <http://www.bath.ac.uk/xxx>
> >>>>> http://www.ukoln.ac.uk/xxx <http://www.ukoln.ac.uk/xxx>
> >>>>> http:///www.tsoid.org.uk <http:///www.tsoid.org.uk> As last
> >>>> resort if
> >>>>> they all fail then if it is known (suspected) to be a Handle for
> >>>>> example try
> >>>>> http://hdl.handle.net/xxx <http://hdl.handle.net/xxx>
> >>>> (if it is known to be a Handle)
> >>>>
> >>>> But how does the user know that they have to do this?  And
> >>>> while I agree that local systems could be configured to do it
> >>>> for people, the point is that, in the global context of the
> >>>> Internet, *all* local systems have to be modified to work
> >>>> this way because otherwise there isn't any global
> >>>> predicatability about whether/how the identifier is
> going to work.
> >>>>
> >>>>> The doi 10.1790/712276811646 can already be resolved via several
> >>>>> domains even though they all point to the same place. It
> >>>> can also be
> >>>>> very effective to effect a Google search on xxx rather than
> >>>> the whole
> >>>>> uri (try it with 10.1790/712276811646 for example).
> >>>>
> >>>> But this is also an example of the problem!  Google already
> >>>> has built-in knowledge about 'http' URIs and it therefore
> >>>> supports various kinds of 'rich' searches based on them.
> >>>> Contrast this with the rather simplistic text-string search
> >>>> that you have to do to find a DOI.  In short, Google is a
> >>>> good example of a 'local system' that doesn't have any
> >>>> knowledge of DOIs built in, and that therefore can't really
> >>>> deal with them in an effective way.
> >>>>
> >>>>> In addition the system may be set up to check one or
> more digital
> >>>>> rights management services to see if there are any usage
> >>>> restrictions.
> >>>>> http://www.digitalrightsmanager.com/xxx
> >>>>> <http://www.digitalrightsmanager.com/xxx>
> >>>>>
> >>>>> So when it is said that hdl:10.1790/712276811646 or even just
> >>>>> 10.1790/712276811646 is not globally unique then that may
> >>>> be become an
> >>>>> issue, but if if it is known that it is Handle or some
> other well
> >>>>> managed name space it is not a problem.
> >>>>
> >>>> Once again, how do I know that '10.1790/712276811646' is a
> >>>> Handle?  The form prefixed by 'hdl:' is better because if the
> >>>> syntax that I'm dealing with (e.g. XHTML or an XML schema)
> >>>> tells me to expect a URI and I see 'hdl:10.1790/712276811646'
> >>>> then I know, in some sense, that I've got a Handle.
> >>>>
> >>>> This problem is further compounded because the 'hdl' URI
> >>>> scheme isn't registered yet.  Therefore I actually have no
> >>>> guarantees about uniqueness or persistence - because it is
> >>>> the scheme registration process that gives me those two things.
> >>>>
> >>>> So, to refer back to a couple of the comments you made right
> >>>> at the beginning, not only is it critical that the identifier
> >>>> is a valid URI, but it also *must* be a registered URI -
> >>>> because it is those two features that provide us with the
> >>>> global Internet context that ensure uniqueness and
> >>>> persistence.  (And it must be a URI scheme supported by *all*
> >>>> the deployed software components that make up the Internet).
> >>>>
> >>>>> But when it is said hdl:10.1790/712276811646 or even just
> >>>>> 10.1790/712276811646 is not resolvable **on its own** I
> >>>> would say that
> >>>>> is intended and very desirable.
> >>>>
> >>>> I think this statement runs to the heart of our
> >>>> differences...  I would argue that this feature is neither
> >>>> intended nor desirable!  My guess is that Handles were
> >>>> designed to be resolvable, but that the Handle software
> >>>> hasn't been widely enough deployed in browsers, etc. in order
> >>>> that the majority of Internet users can take advantage of
> >>>> their resolution cabability.
> >>>>
> >>>>> It is very likely that anyone who exposes the Handle id
> >>>>> 10.1790/712276811646 will also prefix it with one or more
> >>>> domains that
> >>>>> can resolve it. So the resolution and identification may be
> >>>> contained
> >>>>> in a single uri but the id and resolution are, and can be,
> >>>> separated.
> >>>>
> >>>> Just to clarify, I presume you mean prefixed by something to
> >>>> form an 'http' URI?
> >>>>
> >>>> In which case, I would argue that it is the 'http' URI that
> >>>> functions as the identifier, not the unprefixed string?  But
> >>>> I agree that one could argue it both ways.
> >>>>
> >>>>> If the id is a for example a url are there any syntax
> problems with
> >>>>> for example trying to resolve
> >>>>> http://www.egu.gov.uk/http://www.tsoid.org.uk/xxx
> >>>> <http://www.egu.gov.uk/http://www.tsoid.org.uk/xxx>  ???
> >>>>
> >>>> I'm not 100% sure what you are asking here, nor why?  The URI
> >>>> above is invalid I think, but proper character encoding could
> >>>> be used to make it valid.  But... so what!?
> >>>>
> >>>>> Summary
> >>>>> The main (or even sole) purpose of a digital identifier is
> >>>> to maintain
> >>>>> the globally unique persistence of the identifier -
> >>>> resource relationship.
> >>>>>
> >>>>> The persistence of the resolution is separate and
> >>>> secondary, but still
> >>>>> important.
> >>>>
> >>>> I wonder if it is always the case that resolution is secondary?
> >>>>
> >>>>> This resolution may be done independently by multiple
> >>>> communities or
> >>>>> organisations, possibly selected as trusted services by
> the user.
> >>>>
> >>>> I certainly don't disagree with this.  But it misses the more
> >>>> important point - that the identifier needs to work
> >>>> seamlessly in the context of the currently deployed Web or
> >>>> that we need to have a way of getting very widespread
> >>>> adoption of a new identifier by all parts of the Web
> >>>> infrastructure in order that it will work seamlessly.
> >>>> Clearly, the latter approach will not be easy, especially
> >>>> given that it is nearly impossible to register new URI
> >>>> schemes.  In any case, the hard work really only starts once
> >>>> registration has happened - the hard work is in getting the
> >>>> majority of technology developers to adopt it into their
> products.
> >>>>
> >>>> Andy
> >>>> --
> >>>> Distributed Systems, UKOLN, University of Bath, Bath, BA2 7AY, UK
> >>>> http://www.ukoln.ac.uk/ukoln/staff/a.powell/
> >>>> <http://www.ukoln.ac.uk/ukoln/staff/a.powell/>       +44
> 1225 383933
> >>>> Resource Discovery Network http://www.rdn.ac.uk/
> >>>> <http://www.rdn.ac.uk/> ECDL 2004, Bath, UK - 12-17 Sept 2004
> >>>> - http://www.ecdl2004.org/ <http://www.ecdl2004.org/>
Top of Message | Previous Page | Permalink
JiscMail Tools

Files Area | help
RSS Feeds and Sharing

Search Archives

Advanced Options