JISCMail - DC-SOCIAL-TAGGING Archives

Andrew said:

> It's not clear to me why you would want to invent a new 
> namespace for the definition of dctag when it was related to 
> but not purporting to be dc:subject.  This seems similar to 
> what is done with encoding schemes, e.g., dct:DDC, dct:LCC, 
> dct:LCSH, dct:MESH, dct:NLM, dct:TGN and dct:UDC. Taken from 
> "Expressing Qualified Dublin Core in RDF /XML" document:
>
>  <dc:subject>
>    <dcterms:MESH>
>      <rdf:value>D08.586.682.075.400</rdf:value>
>      <rdfs:label>Formate Dehydrogenase</rdfs:label>
>    </dcterms:MESH>
>  </dc:subject>
> 
> It seems like you could just define a new encoding scheme, 
> e.g., dcterm:TAG, to handle the semantics of social tagging.  
> However, that might not be enough.  Organizations such as 
> Flickr, YouTube, etc. may desire slightly different semantics 
> for their social tagging.  DCMI probably doesn't want to keep 
> defining new encoding schemes on a regular basis.

Leaving aside the "which namespace do we use" issue for a second, I
think we need to be careful not to confuse two very different types of
thing, two different types of term used in DC metadata: properties and
vocabulary encoding schemes.

A property is a specific type of relationship. The dc:subject property
is one specific type of relationship, defined and named by DCMI (using a
DCMI-owned URI) and described by DCMI in human-readable terms as "The
topic of the content of the resource." 

A vocabulary encoding scheme, on the other hand, is something quite
different. According to the DCMI Abstract Model, it is a class of which
the value is an instance. N.B. This is one of the areas where a change
is susggested in the proposed revisions to the DCAM - the suggestion is
that we change the concept of VES to something like "an enumerated set
of which the value is a member", and that is _not_ represented as an
instance/class relationship, but for the purposes of this discussion I
don't think that matters too much. The point is that a VES is a
different thing from a property and "plays a different role" in DC
metadata.

So, when I use the dc:subject property in an RDF triple or a DC
statement, I'm making an assertion that

resource:A has-as-topic resource:B   

Or maybe more colloquially

resource:A is-about resource:B

I could specify that resource:B is an instance/member of dcterms:LCSH or
an instance/member of dcterms:DDC (i.e. I could specify a vocabulary
encoding scheme for the value). That provides some additional
information about the value - it's an instance/member of some specified
class/set - but that doesn't change the nature of the relationship that
I'm asserting between resource:A and resource:B. The property referred
to in my triple/statement is still the same: the dc:subject property.
I'm still asserting a "has-topic"/"is-about" relationship.

If resource:B is a tag, and I use it as the object in an RDF triple or a
DC statement with the dc:subject property, then I'm making an assertion
that

resource:A has-as-topic tag:T   

Or

resource:A is-about tag:T

I could specify that tag:T is an instance/member of petej:TagSet (i.e. I
could specify a vocabulary encoding scheme for that value), but - as in
the example above of LCSH and DDC - adding the vocabulary encoding
scheme provides additional information about the value, but it does not
change the assertion I am making about the nature of the relationship
between resource:A and tag:T. It's the property which specifies the
nature of the relationship, and as long as I'm using the dc:subject
property, I'm asserting a "has-topic"/"is-about" relationship.

And in my previous message, I was arguing that when people "tag"
resources, yes, they are asserting a relationship between the tagged
resource and a tag (but see also note below), but it is _not_ true that
the relationship they are asserting is always a "has-topic"/"is-about"
relationship. On the contrary, people use tagging to represent all sorts
of relationships  - ownership, status, "rating", related-location. A
resource tagged "to-read" on del.icio.us isn't "about" the concept of
not having been read yet. Well, yes, I accept that somewhere out there
someone has written a weblog post describing the pile of paperbacks on
their bedside table and a del.icio.us user has indeed tagged it as
"to-read" with that notion in mind, but in the vast majority of cases
that isn't the case! ;-)

So representing all tagged-resource/tag relationships as statements
using the dc:subject property not only fails to capture the particular
relationship that someone had in mind when they tagged a resource, but
asserts a relationship which - in many cases - the tagger did not
intend.

(Actually, I should have highlighted yesterday that del.icio.us doesn't
only represent tags using dc:subject, it also uses the property
http://purl.org/rss/1.0/modules/taxonomy/topics from the RSS taxonomy
module. But I'd suggest that the same issue arises. Tagging is used with
intent other than to indicate a has-topic relationship.) 

So, in the general case, a property other than dc:subject (or
taxo:topic/taxo:topics) would be required. You could argue that the
dc:relation property does the job - there is some unspecified type of
relationship between the resource  and the tag - or you could argue for
a more specific "is-associated-with-tag" or "is-tagged-with" property.
I'd argue against putting "subject" in the name/URI because I think we
want to avoid suggesting (even to a human reader) any relationship with
the dc:subject property.
 
And indeed the ontology I referred to yesterday provides such a property

http://www.holygoat.co.uk/owl/redwood/0.1/tags/taggedWithTag

Described as "Indicates that the subject has been tagged with the object
tag. This does not assert by who, when, or why the tagging occurred. For
that information, use a reified Tagging resource."

So we can say

resource:A tags:taggedWithTag tag:T

The final part of that description is what I was referring to in my "but
see also note below" above. Depending on what information it is
desirable/necessary/useful to capture about the "tagging", then you may
wish to adopt the approach of describing that "event" in more detail. If
I understand it correctly, the ontology supports both the simple 

resource:A tags:taggedWithTag tag:T

approach, and it also supports a richer, more complex approach which
seeks to represent more of the "context", particularly the agent who
performed it and the point in time they did so, by representing a
"tagging event" as a resource ("reifying the tagging").

See http://www.holygoat.co.uk/projects/tags/ for more discussion,
examples.

> However, DCMI would not need to define new encoding schemes 
> on a regular basis since the above qualified Dublin Core 
> really boils down to:
> 
>  <dc:subject>
>    <rdf:Description>
>      <rdf:type rdf:resource="http://purl.org/dc/terms/MESH"/>
>      <rdf:value>D08.586.682.075.400</rdf:value>
>      <rdfs:label>Formate Dehydrogenase</rdfs:label>
>    </rdf:Description>
>  </dc:subject>
> 
> Which generates the same exact RDF triples.  This implies 
> that anybody can create new encoding schemes and semantics, 
> albeit not in the dcterms: namespace since it is controlled 
> by DCMI, and still be compatible with the DCMI model.  So if 
> Flickr wanted to define their definition of tags they could just do:
> 
>  <dc:subject>
>    <rdf:Description>
>      <rdf:type rdf:resource="http://www.flickr.com/photos/tags/"/>
>      <rdf:value>D08.586.682.075.400</rdf:value>
>      <rdfs:label>Formate Dehydrogenase</rdfs:label>
>    </rdf:Description>
>  </dc:subject>
> 
> Which would provide interoperability with Dublin Core without 
> DCMI lifting a finger.  Internally at OCLC, for research 
> projects, we have been using this interoperability practice 
> for defining new encoding schemes to controlled vocabularies 
> that DCMI has not defined.
> For example, GSAFD, NGL, RVM, etc.

Oh, yes, in terms of the name/URI for the term, I quite agree that
there's no requirement that a DCMI-owned URI is assigned. I don't mind
whether it's a DCMI-owned URI or a URI owned by another agency (as long
as it's an agency I trust to (a) manage their URIs sensibly so as to
ensure a "reasonable" degree of persistence and (b) provide consistent
representations of the identified resources in a way which makes those
representations accessible to my tools using simple widely-deployed
mechanisms in accordance with W3C guidelines.) 

But (IMHO) the requirement would not be satisfied by coining a new
vocabulary encoding scheme, whether that scheme was identified by a
DCMI-owned URI or a URI owned by another agency.

Pete