draft-kunze-dc-00.txt - "Dublin Core Metadata for Simple Resource
Description" says:
4.12. Language Label: LANGUAGE
Language(s) of the intellectual content of the resource. Where
practical, the content of this field should coincide with the
NISO Z39.53 three character codes for written languages.
Though the use of this scheme may be widespread in the (US?) library
community, language labeling on the Internet doesn't use this scheme, but
rather that of ISO 639 - "Codes for the representation of names of
languages", together with ISO 3166 - "Codes for the representation of names
of countries". The application of these standards to the Internet is
specified by RFC 1766 - "Tags for the Identification of Languages" and by
RFC 2070 - "Internationalization of the Hypertext Markup Language". These
RFCs are, in turn, referenced by many others.
The language tags defined by RFC 1766 are multipart, eg "en" and "en-us",
but are interpreted as single tokens, without an inner structure. RFC 2070
introduces the concept of a language hierarchy, which is especially useful
in the context of the Web. A user may search for documents which are in
"en-us" and get only those. Alternatively, s/he may search for documents in
"en" and would get documents in "en", "en-us" etc.
As this approach is in widespread use on the Web, the adoption of a
different scheme, no matter how popular within a specific community, would
be most unfortunate.
Misha
|