Dear all,
I'd like to share some insightful comments from Dan Brickley
about what has made the Semantic Web message more difficult
to convey than some of us had expected.
As the comments were made on a closed list, I have with Dan's
permission removed the context from the excerpts below.
Tom
Dan was asked why it has taken since 1998 to get the world to
understand what can be achieved with URIs and 3-tuple data
representations. Dan's reply:
Part of our problem, I fear is that we have collectively tended to
approach the situation with an essentially evangelical style.
Time and again, this has got smart people interested and intrigued,
and so they go try out some RDF tools.
Very often this is a frustrating experience. And there are good
technical reasons why working with RDF (* or any other '3-tuple based
Structured Data Representation' *) will often be frustrating. The
3-tuple approach thrives in chaotic situations where data flows
around, with bits missing, bits added, extensions and gaps everywhere.
This kind of data is intrinsically rather annoying to deal with. There
are workaround and strategies (details on request :) but that
frustration is inevitably core to the experience, because it is a set
of problems the RDF data model was designed to engage with.
So http://www.w3.org/DesignIssues/LinkedData.html marked a turning
point when TimBL took FOAF's RDF linking model, improved it by
demanding URIs everywhere (rather than our earlier bNodes and
seeAlsos), and inspired mass publication of RDF data. Until we had
data, few were RDF-curious. Now we have data, we can disappoint more
curious new people per month than ever before. Or on a good day, make
them happy.
The Semantic Web project has delivered several four specific things to
the world so far: data, tools, community and standards.
Because it grew from a standards organization, the tendency has been
to focus on the standards, and what they do to improve the world - the
3-tuple model as seen in RDF, and the specs that build on top of it
(SPARQL, RDFS/OWL etc.).
Now standards are great, but they're pretty distant from solving
day-to-day problems. And there are good reasons to believe that
3-tuple data structures will typically be annoying to use, as well as
useful. They only really shine when multiple parties are using them in
complementary ways, so that data can be usefully mixed and merged and
extended and overlaid and so forth.
So getting those big public, link-friendly datasets out there was a
foundation for RDFy 3-tuple data becoming more useful than it was
annoying. But it's still annoying for developers, trust me! Having
solid standards with test cases (the RDFCore 2004 revision of RDF) was
a good step forward, but still standards alone are not enough. The
missing ingredients are tooling and community. Both of which we have,
both of which we can always benefit from more/better. So communities
like the RDF/SW interest group at W3C, like Lotico, like the LOD group
which bridged W3C's scene with the outside world, these help new
adopters make the most of the 3-tuple model. I've seen quite a few
efforts burned by mis-applying RDF in contexts where it just wasn't
important or useful to use it. That's natural with a newish
technology. And I've seen smart developers frustrated by the lack of
documentation, polish and guidance around our tooling. But the growing
suite of RDF-oriented tools can't be ignored, and that's a key part of
the technology's appeal.
We have data, now, and that's enough to attract people. But as seen in
discussions around eg. data.gov.uk, many mainstream developers see
RDF, SPARQL and 3-tuples and associated tools as a hurdle or barrier
that stands between them and data. In a way, they're right. We have
all these standards and tools as a means to an end (sharing
information, the Web's founding slogan
http://www.w3.org/Illustrations/LetsShare.ai.gif "Let's share what we
know"). RDF is not an end in itself.
So imho the message should not be "we've found the best technical
model for sharing data on a global scale - URI-linked 3-tuples!", but
rather, that we have a global community committed to sharing data,
tools, standards and their own experience and time in pursuit of
solving problems through information linking. This doesn't mean that
all tools need be opensource, nor all data public, but that there are
common architectural principles giving coherence to all this data, all
those tools...
All the time we frame this as "RDF is 'easier/better' than
[wonder-technology X]" we will lose. It's not. And nor is any vaguer
notion of "3-tuples with URI" [...]. What we have here in
the Semantic Web effort that is unique is a special combination of
data, tooling, standards and community that simply can't be found
anywhere else...
And to a follow-up question on the exactly what problems people
and developers have with 3-tuples, or what they would rather have
in their place...:
I think it's not so much the 'what they get back' (API/format/model),
but the whole framework of how we structure our data.
If you're used to XML or SQL schema structures, the schema designer is
typically (not necessarily) in a much more authoritative role. With
RDFS we stripped a lot of power away from schema designers: they can't
tell you what to do any more! There's no "a shipping order *must* have
an address" mechanism in RDFS/OWL. For e.g., as editor of the FOAF
vocab's RDFS I can never say anything in an imperative style in the
schema, all I can do is define the meaning of the classes and
properties in the FOAF namespace. Same for the Dublin Core team, for
SIOC, etc. This permissiveness encourages re-use in lots of different
ways.
This is simultaneously critical for scaling to the Web, but also, as I
say, annoying to be on the receiving end of. For developers trained in
the idea that schemas tell you what is or is not an acceptable
instance, RDF is strangely passive. The only formal way of screwing up
in RDF is contradicting yourself. Someone could publish a FOAF-based
RDF/XML document that was simply a collection of triples using
'foaf:homepage'. Even with bNodes on either side of the property. Or
someone else might publish a bunch of <foaf:Image about="uri"
dc:title="...."/> triples. The FOAF vocabulary faciliates this, and
that is useful, but it also means that knowing the vocabulary is not
itself enough for interop. You only get interop when a bunch of folk
do things in roughly the same way; using the same triple patterns.
There's a whole layer to do with characterising more specific triple
patterns, 'idioms', that is essentially missing from our collective
practice. There have been experiments in various directions towards
characterising such patterns (eg. using SPARQL, see Schemarama...) but
as a community we seem to act as if schemas are all that's needed.
As Ed Dumbill put it (http://times.usefulinc.com/#13:13 via
http://danbri.org/words/page/27?sioc_type=user&sioc_id=22 )
"Processing RDF is therefore a matter of poking around in this graph.
Once a program has read in some RDF, it has a ball of spaghetti on its
hands. You may like to think of RDF in the same way as a hashtable
data structure -- you can stick whatever you want in there, in whatever
order you want."
This loose nature is the key at once to our success and to our
problems. The analogy is with developers who are used to nice (if a
little brittle/rigid) OO models are not always happy replacing
everything with a chaotic hashtable. At least not unless we have a
good set of unit tests. And what we're missing, by analogy, is just
that. Nobody knows when they've been passed a 'good' RDF graph, versus
one so uninformative, or expressed in such alien terminology, that it
can't be used for the task at hand. So some of the essential ideas
from non-RDF development just don't really make sense when using
unconstrained triples. That leads to headaches, frustrations etc.
--
Thomas Baker <[log in to unmask]>
|