Here follows a piece of TidBits about MetaContent.
TidBits is a very large electronic magazine for the Mac.
What I miss is any reference to SGML.
Greetings
<< start of forwarded material >>
Date: Mon, 25 Nov 1996 20:21:15 -0800
From: [log in to unmask] (TidBITS Editors)
Subject: TidBITS#355/25-Nov-96
To: [log in to unmask] (TidBITS Distribution)
Mime-Version: 1.0
Precedence: Bulk
Reply-To: [log in to unmask]
X-Subscribe: To subscribe to TidBITS, send email to <[log in to unmask]>.
X-Unsubscribe: To unsubscribe, send email to <[log in to unmask]>.
X-TidBITS-URL: <http://www.tidbits.com/>
**Meta-content?** Many of us just got used to the idea that
information businesses are now "content providers," and now we'rebeing
asked to understand "meta-content?" It sounds like a
technogeek term from hell, but it's not so bad.
The American Heritage Dictionary defines the prefix "meta-" in
part as meaning "beyond, transcending, more comprehensive."
Engineers like using the prefix to describe the process of
referring to a process. For example, a joke about a joke would be
"meta-humor," and a language invented to describe other languages
is a "meta-language." (The concept is discussed thoroughly in the
1979 Pulitzer Prize-winning book Goedel, Escher, Bach: An Eternal
Golden Braid by Douglas Hofstadter - a must-read for engineering
or science enthusiasts.) Following this tradition, meta-content is
content that talks about other content.
MCF, as defined by Apple's R.V. Guha (who is responsible for both
HotSauce and MCF) is a "language for representing a wide range of
information _about_ content." A simple example of meta-content is
the header on an email message. It tells you information _about_
the message (who sent it, at what time, how it got to you, where
replies should go, and more) but it's not the message itself: the
person who sent you the mail wasn't sending you the header, but
was sending the content of the message.
**Why Describe Content?** Email headers can be described as a
simple _language_ for describing the content of an email message.
A language, for these purposes, is a set of simple rules that
define valid expressions - in the normal language of mathematics,
for example, "4 + 4" is a valid expression but "76#&98+A!" isn't.
Email would be less useful if there were no headers: any sender
would have to be sure to include the header information in the
body of the message or you, as recipient, would never see it. A
lack of a signature would leave you clueless as to the message's
origin (and a false signature could mislead you further).
So, describing content is a useful pursuit. In fact, when you have
lots and lots of content, navigating through it is next to
impossible without some form of meta-content. Millions of people
turn to Yahoo to find Web pages sorted into useful (if somewhat
arbitrary) categories and classes. The same people could turn to
AltaVista to search millions of Web pages by content, but
searching by content is often less useful when you're browsing. If
you want to find magazines about the Macintosh, you can dive
through Yahoo until you get to a list of some 30 separate Web
sites on the subject. Searching for "Macintosh magazine" in
AltaVista returns about 400,000 matches, including job listings at
Macworld, dozens of pages from the MacToday site, articles from
old issues of Byte in Italian and so on. The raw text searching
capability returns thousands of times more matches, but they're
not as useful as Yahoo's more limited set.
<http://www.yahoo.com/>
<http://altavista.digital.com/>
Once you have a good description of some kind of content, that
meta-content can be effectively and efficiently searched with
excellent results. The major problem is that - so far - good
meta-content comes only from actual people. Technology is getting
better at this - Apple demonstrated agents that distill text
documents into one sentence at Macworld Boston - but humans can
still do much better. Publishers often create library card catalog
entries for books to assist librarians - without that help,
libraries have no way of knowing a book's contents except by
jacket blurbs or the table of contents, and it's rare to find a
library with enough resources to hire a librarian just to read
books and catalog them properly.
In a similar vein, the trend in Web publishing is towards self-
description of Web pages. Assuming you're honest, you can
accurately describe your page in 25 words more accurately than
someone at Yahoo, and much more accurately a text retrieval
system. The HTML 3.2 standard includes a META keyword so you can
add some meta-content information to your Web pages to assist with
automatic indexing and other meta-content creation activities.
<http://www.w3.org/pub/WWW/MarkUp/Wilbur/>
**But Why MCF?** Individually generated pieces of meta-content are
useful, but when you describe collections of hundreds of thousands
of pieces of content, you must have some standards. Let's extend
the example of a library card catalog to one that's being
computerized. When you look at a book's card, you can easily see
if the book has 27 authors (perhaps it's an anthology). If you
enter that information into a database, though, if there are only
three "author" fields, you're stuck - you either leave out 24
authors or you enter them in an unrelated field, such as
"description." Either way will foil people searching for books by
one of those 24 people (who's going to search the description
field for an author?). Large meta-content systems must be
flexible; in fact, the MARC format used by the Library of Congress
consists of a set of tagged data - you can have as many author
tags and authors as you want for any particular entry, limited
only by your particular computer's capability to store them.
<http://lcweb.loc.gov/marc/>
So why not use an existing format like MARC to describe content on
the Web as well? MARC is not an open standard. The "tags" used to
indicate what each given entry contains are in fact numbers; and
numbers not published are reserved for the MARC committee's
definition, with only some exceptions. Further, MARC records
include binary data and aren't easily human-readable. Conversely,
Guha's MCF format is more like HTML. Consider that in HTML, a Web
page author can invent her own tags. If someone's browser doesn't
know how to interpret them, they'll just be ignored. If a browser
does interpret them, then the page can include nifty new features.
Netscape does this with nearly every release of Navigator.
Apple's hoping MCF has a similar reception - it's a simple, text-
based format that defines objects and their properties. There are
no restrictions on what properties are described for each object,
nor are there requirements that all properties be described or
that all relationships between objects be included. HotSauce's
implementation of MCF only handles a few properties for each
object: "parent" objects, "child" objects, suggested locations
where the children might appear in the 3-D fly-by in relation to
the parent objects, and that's just about it. You can get the
white paper on MCF at the URL below.
<http://applenet.apple.com/hotsauce/text/mcf.html>
Apple has submitted MCF to the Internet Engineering Task Force
(IETF) for consideration as an Internet standard for describing
content, and I'm unaware of any similar counter-proposals. If the
IETF does accept MCF as a standard, we can presume there will be a
set of standard attributes for describing data (common things like
"name", or "URL"; maybe a "description", or "creator", or other
similar tags), but extra data can still be included.
**What Will MCF Do For Us?** In case you're digesting all this
with a resounding "Big deal!" building in the back of your throat,
you have to realize that most standards are boring - it's what's
done with them that's interesting.
Think of HTML. The idea of marking up text with more text that
indicates what the original text should look like is, well, a
silly idea. It's not a compact way to indicate stylistic changes
(a "bold" command can be expressed in less than one byte, rather
than the lengthy <STRONG> tag), HTML source is not easy to read,
and it's not suitable for advanced page descriptions.
But, HTML is easy for computers to work with, it's extensible (as
we've seen), and the simple hypertext capabilities that link a
phrase on a page to a completely different page led to the Web
browser, which led to today's World Wide Web, which has been noted
to be a Really Big Deal.
MCF has the same features - it's easy to create, easy to use, and
easy for computers to work with it. For lack of a better term, I
envision an MCF "browser" program that can navigate through any
collection of MCF-described data. Apple's HotSauce Web site has
several such MCF collections, called "X Spaces" because of the
early Project X name. If you have Apple's Netscape plug-in for
HotSauce, you can fly through any of these X Spaces in your Web
browser.
<http://applenet.apple.com/hotsauce/>
You can also download a stand-alone HotSauce application and view
X Spaces that way. It includes a choice of viewing formats
recently added to the plug-in - the 3-D fly-through method, or a
two-dimensional Finder-like view with folders and disclosure
triangles that reveal folder contents when clicked, just like the
Finder's View by Name capability. Note that the MCF file
describing the data didn't change; the program is just viewing it
in a different way.
There is the real key - a single way of describing a large set of
data can be displayed in whatever fashion a programmer can invent.
The current HotSauce visual interface isn't all that impressive in
today's age of 3-D rendered graphics, but it's just a way of
looking at MCF data - it would be relatively easy to create a
different interface to the same data.
If every Web site generated an MCF description of itself, you
could fly-through any site and find the information most relevant
to you, without using a site map (which may not be useful at all;
some Web site maps are woefully inadequate), or search the site as
if it's a Finder window. An MCF-viewing Live Object would add that
capability to any OpenDoc container on your Macintosh.
The same MCF browser or viewer part could take you through your
own hard disk, through Yahoo's Web pages, through every Web site
with an MCF description - even through a database that has an MCF
description (imagine browsing huge databases as you could your own
hard disk!) - through just about anything at all.
That's why Apple executives say MCF will do for databases what
HTML does for text. If it's adopted by the world at large, as an
IETF standard or otherwise, they could be right.
**Competing Meta-Content Standards** -- There have been other
efforts to create a standard description for content, but none has
a company like Apple behind it. Further, Apple's MCF inventor,
R.V. Guha, has built upon the work of committees investigating
such possibilities, including the Dublin Core group that has a
preliminary standard. MCF is in its early stages; though Dublin
Core is a little bit more academically inclined and bears a
resemblance to library cataloging structures, Guha's white paper
says there's no reason why the benefits of Dublin Core can't be
expressed in MCF with some work to define a syntax.
What about Microsoft's Nashville project? Nashville is the code
name for Microsoft's "Internet Add-On Pack," expected to come soon
for Windows 95 and Windows NT platforms (apparently now also
called "Active Desktop"). It's been described in the press as
"building the browser into the operating system," and is supposed
to include a way to let you view your hard disk as a Web page,
complete with hyperlinks. It does exactly that, according to my
research.
What Nashville does _not_ do is describe both Web pages and hard
disk contents in a meta-content format, then use an MCF-like
technology to view both. Nashville replaces (or adds to) Windows'
desktop program (their Finder, if you will) by sharing code with
Microsoft Internet Explorer 4.0. If you move the discussion to
more familiar Macintosh terms, then with Nashville, Web windows
could open in the Finder without launching a separate browser
program (just like sounds and clippings files), and you could even
change your desktop to display live Web content instead of just
file icons and Finder windows. You could also embed Finder-like
panes into Web pages or documents.
Microsoft does all this without a meta-content format by using an
ActiveX control to display file and folder views inside Internet
Explorer windows. The browser itself doesn't know anything about
the hard disk; it just knows about ActiveX and has an ActiveX
component knows about the hard disk. (In our earlier example,
OpenDoc wouldn't know about MCF, but an MCF Live Object could give
that functionality to every OpenDoc document.)
Nashville's technology is nice. A future version of the Macintosh
OS could go even further with OpenDoc because OpenDoc can embed
any Live Object, where Nashville appears to embed only ActiveX
controls inside Web browser windows or panes (you couldn't, as I
read Microsoft's descriptions, have a large spreadsheet with some
Web content embedded in it, but you could have a large Web page
with spreadsheet content embedded in it).
Nashville is likely to be available before IETF does serious work
with MCF, but since the two are not competing standards, that
shouldn't make any difference except in public perception.
Microsoft hasn't come out against MCF, and if it takes off as it
could, Microsoft will probably embrace MCF as quickly as any other
Internet-savvy company.
Access to data isn't a problem anymore, but finding _useful_ data
is becoming extremely difficult with the proliferation of sources.
MCF is a potential way to make the growing Internet a little more
manageable, and I can see why Apple is excited about it.
[This article is reprinted with permission from MDJ, a daily
Macintosh publication covering news, products, and events in the
Macintosh world. If you can't get enough insightful Mac news, sign
up for a trial subscription to MDJ. For more information and the
free MDJ Recap #1 in setext or Acrobat PDF format, visit the MDJ
Web site at <http://www.gcsf.com/>.]
$$
Non-profit, non-commercial publications may reprint articles if
full credit is given. Others please contact us. We don't guarantee
accuracy of articles. Caveat lector. Publication, product, and
company names may be registered trademarks of their companies.
This file is formatted as setext. For more information send email
to <[log in to unmask]>. A file will be returned shortly.
For information on TidBITS: how to subscribe, where to find back
issues, and other useful stuff, send email to: <[log in to unmask]>
Send comments and editorial submissions to: <[log in to unmask]>
Issues available at: ftp://ftp.tidbits.com/pub/tidbits/issues/
And: http://www.tidbits.com/tb-issues/
To search back issues with WAIS, use this URL via a Web browser:
http://wais.sensei.com.au/macarc/tidbits/searchtidbits.html
-------------------------------------------------------------------
<< end of forwarded material >>
Gerard Freriks,huisarts, MD
C. Sterrenburgstr 54
3151JG Hoek van Holland
the Netherlands Telephone: (+31) (0)174-384296/ Fax: -386249
Mobile : (+31) (0)6-54792800
ARS LONGA, VITA BREVIS
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|