Mikael Nilsson wrote in reply to Jon Phipps:
> Well... what I'm saying is that you do have the freedom to choose your
> model, and in all cases be able to use the legacy literals for some
> useful purpose.
>
> Still, there might be *other* reasons to use the simpler, literal-based
> model. The fact that the legacy data uses literals doesn't mean that the
> simpler model is a requirement.
>
>
I think you're correct that there are a variety of reasons to ensure
that a "simpler, literal-based model" is possible with RDA. For one
thing in our years of working with data providers in NSDL, we found that
most newbie data providers started out in the same place, with simple
literals. Some were able to move up from there; some never had the
interest or motivation. DCMI's experience with beginners has been
similar.
But I think it's pretty clear that for RDA to get the uptake it needs
for success we must consider the simpler model a requirement. It's not
something that needs to be our primary focus at this stage, but in
getting the formal representation of the elements "right" we need to
keep the big picture in the back of our minds. I firmly believe that
judgment of our ultimate success will rest on both those pillars.
>
>> But this isn't really the best level for discussion of the problem.
>> The problem is the inherent and undeniable literalness of legacy data
>> and not so much what model to apply, but what strategy to use to
>> approach the owners of that data. The question is: What is the most
>> effective (maybe not the best) strategy to follow to accommodate
>> legacy data? Do we create a model that somewhat matches the existing
>> data, or do we try to force existing data to fit an ideal model. From
>> my long experience with translating and aggregating data across
>> systems, it's not usually an effective strategy to try to force
>> producers of data to conform to a model that's beyond their ability
>> to easily and _cheaply_ comply with.
>>
>
> Well, in the examples I gave, the only cost would be generating an extra
> triple for the RDF data. I fail to see the cost there.
>
> If, however, there are incompatibilities that need manual conversion,
> well, that would be an issue. My examples show two cases where the cost
> is zero. It should be an interesting exercise to find the cases where
> the cost is non-zero.
>
>
There will be data coming over from the legacy records that cannot be
converted easily (or at all) to URIs. Names, subjects,
geographics--yes, we know how to do this and I fully expect that we
will, as a community, figure out how to do it for most of the data.
There will be far tougher questions to solve for other data elements,
and I hope as we move forward on this work and gather some good examples
of how this transition will actually occur, we'll be able to focus those
discussions more concretely. I hope to have a few of these examples
within the next few weeks, coming out of another project I'm working on
where we are thinking about attempting to transform legacy data into an
RDA-like schema.
>> A more effective strategy would provide for flexibility in data
>> production and shift the cost of interoperability to the data
>> consumer. Once you have achieved broad acceptance of the standard,
>> you can introduce refinements to it that are designed to improve the
>> quality of the data produced and incrementally shift the
>> interoperability cost back to the producer. This is the experience of
>> html/xhtml, rss/atom, and as I pointed out, Simple/Qualified DC which
>> actually embodied it as an explicit and ongoing function of the
>> standard.
>>
>
> >From that standpoint, the starting point would be a direct RDF version
> of MARC21? And then evolution from there?
>
That's one strategy but not the only one. Because MARCXML already
exists, and it expresses the full complexity of MARC21, it provides (in
my opinion) a much better place to start any kind of transformation to
RDA. I could be wrong about that, but I hope we'll be able to engage
some of the community to experiment with these ideas within the next few
months.
> I suppose there are two levels here - how large a step will RDA itself
> take, and how literally will we implement the RDA analysis in this work.
> I'd say we need to start with what RDA gives us, not start with existing
> library data - that's not our task.
>
>
Yes, agreed. But as I mention, I think it does us no good to ignore the
legacy data or the thinking that created it, which is pretty essential
to successful completion of our task. Some of us have been suggesting
that RDA should be taking a bigger step into the future, and though
we've seen some good progress in seeing that happen, it is by no means
completely played out yet in the development discussions. This is, of
course, one of the challenges--what you can see published in the
documentation is not a complete picture of where the thinking on these
issues has progressed. What that says to me is that "starting with what
RDA gives us" is a classic instance of attempting to nail jello to a
wall. The RDA documentation gives us a starting point, but there are a
number of important gaps that the development of textual guidance (what
the RDA development effort is mostly about at this stage) will not fill
in for those of us working on the DCMI/RDA Task Group.
This is, of course, the challenge of doing what we're attempting to do,
and do it well. An essential part is to find those gaps, make sure we
understand the implications of trying to fill them in, and not allow the
fear of failure or the desire for perfection stop us from moving forward.
Regards,
Diane
|