The hardest part is actually generating the metadata in the first place,
and making sure your markup is consistent. Consistency in your original
markup is the most important factor, both in terms of formatting the
metadata, and what words you use.
For example, the difference between "Report" and "report" or "Reports"
is enough to have you tearing your hair out down the track when you're
wondering why half your documents disappear. The same goes for
"Electronic Correspondence" and "Electronic Mail". To you they look
different, two different people who haven't got a common standard to
follow, they're both the "authoritative" term for what they're describing.
I guess what I'm really saying is that - from my personal experience:
1) extracting embedded metadata is a piece of cake
(even if the formatting isn't all that consistent)
2) Populating the metadata correctly in the first place is
where you'll have problems.
3) Anything generated by your humans will be broken in more
ways than you have humans
4) Anything generated by your computers will at least be
broken the same way
Alex
[log in to unmask] wrote:
>
> For reasons related to time, cost and other business priorites I am looking at
> starting with embedding metadata in html files in the short term with a desire
> to move to a repository in the longer term.
>
> Has anyone been through this process and can you offer any advice on any aspect
> of it?
>
> For example, I am have been told that it would not be difficult to extract the
> embedded metadata at a later stage once a repository is available. Any comments?
>
> Thanks
>
> Kathleen Lazzari
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|