Phil Shaw wrote:
> MKDoc Ltd. would like to announce the first beta release of
> MKSearch, under the GNU General Public Licence. Source and
> pre-compiled binary downloads are available from the project Web
> site.
>
> http://www.mksearch.mkdoc.org/downloads/
>
>
> MKSearch is a metadata search engine that indexes structured
> metadata in Web documents, not free text in the document body.
> The data acquisition system:
>
> * Conforms to the Dublin Core metadata in HTML
> recommendations [1]
>
> * Supports other application profiles, such as the UK e-Government
> Metadata Standard [2]
>
> * Indexes native RDF formats, including RSS 1.0
[snip]
> 3. A set of custom indexers based on the Simple API for XML (SAX)
>
> * Extracts metadata from HTML meta and link elements
> * Converts metadata to RDF triple statements
> * Configurable application profiles
Good stuff.
I confess I haven't looked at the options for configuring your
extractor, but I wondered whether you had considered adding support for
GRDDL [1]?
That way you wouldn't be limited to extracting RDF data embedded
according to the conventions described by the DC-in-X/HTML spec, but you
could extract RDF data embedded according to any set of conventions that
was identified by an HTML profile and was GRDDL-enabled (see e.g. the
recent thread here on "Naked Metadata", especially Alan Cox' message
[2], and Ian Davis' "Embedded HTML" [3] as an example) - and (when DCMI
gets around to GRDDL-enabling it, which is in the pipeline) that would
include the case of the DC-in-X/HTML spec.
It seems to me GRDDL offers a very flexible approach to
encoding/extracting RDF data in/from XHTML (and indeed from other XML
formats).
Pete
--
Pete Johnston
Research Officer (Interoperability)
UKOLN, University of Bath, Bath BA2 7AY, UK
tel: +44 (0)1225 383619 fax: +44 (0)1225 386838
mailto:[log in to unmask]
http://www.ukoln.ac.uk/ukoln/staff/p.johnston/
[1] http://www.w3.org/2004/01/rdxh/spec
[2] http://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind0511&L=dc-general&P=233
[3] http://research.talis.com/2005/erdf/wiki/Main/RdfInHtml
|