Brian,
[Apologies for the delay in reply: post launch issues...]
Yes. Behind the 'Semantic Web' lies the development of schemas which are
flexible enough to accommodate the breadth of content making up the WWW
(now and future) - ideally.
The example you use demonstrates content* (see footnote below) 'within'
a schema: the content is fitted to the schema by inserting the
appropriate markup thereby breaking it up into its constituent parts
('fields' in your example). This transforms the content so that it sits
somewhere between unstructured and structured.
'The metadata is the data' means that either the metadata can be
reliably extracted from the structured data as and when it is required
or that there is enough structure to obviate the need for metadata. The
issue is that metadata is data about data i.e., it represents data but
is an entity in and of itself regardless of where it is located (i.e.,
within or between resources). Moreover, an ideal is for metadata to be
unnecessary as the data will be sufficiently structured to support
inference i.e., a description will be unnecessary as examination of the
data will be sufficient. [If you haven't noticed, a lot of this is
actually AI]
It is probably uncontroversial to predict that only a subset of the
WWW's content will be structured in the way you describe and most
probably this will concern content which is already structured (that
which is currently in databases, etc.): arguably structured in nature
anyway e.g., a personnel record. However, the majority of the WWW's
content is textual/ visual e.g., an illustrated poem or perhaps a
strategic plan. Here, applying structure requires commitment in terms of
the intended purpose/ use and this is a real can of worms: ambiguity,
multiplicity of schemas, absence of consensus, etc. The primary concern
is the reader (people not computers), and visual schemas will dominate.
Communication between people is absolutely key, whether immediate
(voice, etc.) or latent (text, etc.). Getting metadata, let alone
structuring the data itself, is fruitless without sufficient command and
control (either forced through the management of people or the tools
used) - getting people to produce metadata voluntarily simply does not
work, at least not without intrinsic reward...
So the Semantic Web needs to accommodate both structured content as well
as unstructured content with associated metadata as well as structured
content without metadata. Fundamental to success is the way in which
content is authored/ created. If the tools used are not enforcing the
schemas (for content or metadata), it will fail - or to put it another
way, partially succeed... i.e., semantic webs. Where does this leave the
unstructured content without metadata? More to the point, where does
this leave users?
As for the 'magic', this is a community of information professionals
(librarians, analysts, developers, etc.) behind the scenes who will
enable this in part or whole. They will be developing the schemas (and
schemes) as well as the mappings between schemas (and schemes) for both
content and the metadata to describe content. Meaning and knowledge
exist in people's heads and cannot be explicitly represented, at least
in terms of predicate logic (the 'maths') that RDF is based upon.
Automatically generated RDF is therefore unlikely although no doubt
we'll be exposed to a few instances were this has been achieved.
However, for those managing information services, exceptions are of
limited use.
To couch this in IWMW terms, we're back to gurus again; there aren't
enough of them around to get the job done now that the WWW is pervasive!
Stephen...
* I use the term 'content' to refer to data, infromation and more:
poetry is neither data nor information. To keep the email short, these
terms are used interchangeably.
-----Original Message-----
From: Brian Kelly [mailto:[log in to unmask]]
Sent: 24 June 2003 11:21
To: Emmott,Stephen; [log in to unmask]
Subject: Semantic Web and UK HEIs (was RE: New LSE website launched 23rd
June)
...
> I'd welcome constructive criticism from colleagues at other HEIs and
> would encourage a debate on our ability as a community to make a
> transition to the 'semantic web'. One question I always ask regarding
> metadata: Where are the tools? (i.e., tools that the owners/
> publishers of content can use)
Hi Stephen
As the person who chose the topic of the Semantic Web as a plenary
talk at the recent Institutional Web Management Workshop I guess I
should respond :-)
I am very much aware that there is not a clear understanding of what
is meant by the Semantic Web and what we can gain from it. Let me give
you my views.
With a traditional XML-based Web you can do lots of useful things. As
you've done at LSE, you store your data in XML and use XSLT to transorm
it to XHTML. You could also use XSLT to transform it to other formats.
However if a third party wishes to integrate your data with theirs
and with other data, there is a problem. You will have defined your
fields (your XML Schema - i.e. <STUDENT-NUMBER>, <STAFF-ID>,
<VICE-CHANCELLOR>, etc.) according to local needs. Other organisations
will use different schemas. SO to merge the data or search across
different data sets we need either to standardise our schemas
(politically different), put the knowledge in the applications
(expensive, not scalable) or adopt a mechanism which allows different
schemas to be integrated. The Semantic Web provides a solution to this
latter approach.
As an example have a look at http://triplestore.aktors.org/ (having
first installed Mozilla, as this only works in Mozilla). This work has
been carried out by a research group at Southampton University.
This takes data from a number of sources (e.g. the RAE data which is
held on HERO) and converts this to RDF (using a HTML scraping approach).
This can then be integrated with data from other sources - as can be
seen if you have a play in Mozilla.
Rather than a research group converting the data to RDF (and maybe
getting it wrong) it would be better if the data owner made their data
available in RDF. This could be then integratd with third party data.
The bits of magic that make this possible are RDF and URIs. RDF is
an XML format which includes a mathematical expression which defines
relationships between resources. The relationships are not defined in
the RDF language but at a URI - so RDF is extensible.
It would seem that the benefits from the Semantic Web are gained when
you wish to merge data from disparate sources. There is then a question
of who should fund the investment to do this.
My thoughts - which may contain errors due to my flawed understanding
of the Semantic Web.
Brian
PS In response to your question, where are the tools - in the example I
gave the metadata is the data so there isn't a need for metadata
management tools.
> Best wishes,
>
> Stephen...
>
> Stephen Emmott
> Projects Director (Editor in Chief, LSE website)
> Business Systems & Services, LSE
>
|