At Talis' Nodalities, Gavin Carothers and Charles Greer describe the
process that O'Reilly went through to discover that they (surprise!)
needed to use Dublin Core, and express their metadata in RDF. A brief
excerpt to give you the flavor...
In the process of trying to create an XML format we asked a number of
people in the company how to find the Publication Date for a book. The
answer was surprisingly complex. The value was computed independently
by each of the ETL hydras, with subtly different implementations that
had evolved with particular client needs. O’Reilly isn’t a huge
company with layer upon layer of bureaucracy; most questions can be
quickly answered with a chat at a desk or an email to the other coast.
Imagine our surprise, then, at the results of the Publication Date
poll. Most people were confident that one of five dates was the right
date, but disagreed on which of the five it was. Retail Availability
Date, Actual In Stock Date, Estimated In Stock Date, etc each had its
backers. What was really going on was that we discovered the subtle
different needs that each business unit had. The strategy we could
most easily support? Concensus on a public standard. As we’ve
learned so many times, we needed to go outside the company to find the
correct solution. Public standards, specifications, and ontologies
could save us from ourselves.
Enter: Dublin Core. We couldn’t define our own format or use the
industry standard (ONIX), nor could we agree on what a publication
date was. Our only choice was go borrow/steal some other group’s
ideas. It turns out that our problems had already been solved by the
library community. The Dublin Core Metadata Initiative created
standards, guidelines, and examples for storing and sharing basic,
essential metadata. We had a way out, here was a group of people who’d
already done a great deal of thinking for us.
Of course, they hadn’t done all our thinking for us. Mapping all of
our old data into well-designed and well-documented Dublin Core, MARC
Relators, FOAF, or any other ontology was going to be hard. So we
didn’t do it. Instead we mapped the whole of our old, horrible, ugly
mess into an undefined ontology called the “Product Database Legacy
Ontology.” We then moved some of the more obvious items like title and
author into Dublin Core and waited. Only once we had a proven need for
a new data point in real application would we go though the process of
researching, defining, cleaning, and moving it into a modern, public
ontology. For those following along closely: no, trim color isn’t yet
in the public or internal metadata. As it turns out, no one really
wanted it. At least, not yet.