Which set of language codes used would depend on your application. The
Ethnologue set is very comprehensive in that it attempts to identify all
languages. The ISO standards were developed for identifying major
languages for terminologic and bibliographic purposes and the goal was
never to include all languages. Certainly the ISO 3-character list
includes many more codes than the 2-character list. As stated, it does
group some minor languages into language groups when a separate code is
not warranted according to the criteria. The following documents the
criteria for establishing separate language codes:
For ISO 639-2:
http://lcweb.loc.gov/standards/iso639-2/criteria2.html
For ISO 639-1:
http://www.loc.gov/standards/iso639-2/criteria1.html
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^ Rebecca S. Guenther ^^
^^ Chair, ISO 639 Joint Advisory Committee ^^
^^ Senior Networking and Standards Specialist ^^
^^ Library of Congress ^^
^^ Washington, DC 20540-4402 ^^
^^ (202) 707-5092 (voice) (202) 707-0115 (FAX) ^^
^^ [log in to unmask] ^^
^^ ^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Date: Wed, 16 Aug 2000 16:09:16 +0100
> From: =?Windows-1252?Q?Jos=E9_Luis_Borbinha?= <[log in to unmask]>
> To: <[log in to unmask]>
> Subject: Fw: Re: foreign language classifying
> Message-Id: <[log in to unmask]>
>
> I guess this must be relevant for the DCMI, especially for the DCq...
>
> ----- Original Message -----
> From: "Steven Bird" <[log in to unmask]>
> To: <[log in to unmask]>
> Sent: Quarta-feira, 16 de Agosto de 2000 15:13
> Subject: Re: foreign language classifying
>
>
> > The primary language identification schemes are:
> >
> > ISO 639-2: Codes for the Representation of Names of Languages
> > http://lcweb.loc.gov/standards/iso639-2/
> > (There are two versions of the standard, one for bibliographic
> > use, and one for terminological use, which differ on 5% of the codes.)
> >
> > and
> >
> > RFC 1766: Tags for the Identification of Languages
> > http://www.ietf.org/rfc/rfc1766.txt
> >
> > However, they only have a few hundred codes, while the world has over
> > six thousand distinct languages.
> >
> > Another language identification scheme, which also includes language
> > classification information (to the extent that it is known/agreed) and
> > a definition of the linguistic denotation of each code (rather than
> > just conventional language names), is the Ethnologue. The Ethnologue
> > has been compiled over a period of more than 50 years, and will soon
> > be out in its 14th edition.
> >
> > Ethnologue: Languages of the World
> > http://www.sil.org/ethnologue/
> >
> > The ethnologue provides 3-letter codes for over 6,800 distinct
> > languages. Ethnologue data can be accessed over the web for free.
> > There is a search interface at: http://www.sil.org/ethnologue/search/
> >
> > ----
> >
> > For example, the language I work on (Dschang, Cameroon) does not get
> > it's own code in ISO 639-2. It has to be identified as "nic -
> > Niger-Kordofanian (Other)". But this groups it with other languages
> > from all over sub-Saharan Africa for which ISO didn't assign distinct
> > codes, and is quite useless from the standpoint of classification.
> > Someone who wanted to find resources for Grassfields languages ought
> > to be able to find my Dschang dictionary, but 99% of the languages
> > covered by their search would be irrelevant.
> >
> > Using the ethnologue search interface I can quickly find the URL
> > giving demographic and linguistic information for this language:
> >
> > http://www.sil.org/ethnologue/countries/Came.html#BAN
> >
> > YEMBA (TCHANG, DSCHANG, BAFOU, ATSANG-BANGWA, BANGWA, BAMILEKE-YEMBA)
> > [BAN] 300,000 or more (1992 SIL). Major part of Menoua Division,
> > centered around Dschang, West Province. Niger-Congo, Atlantic-Congo,
> > Volta-Congo, Benue-Congo, Bantoid, Southern, Wide Grassfields, Narrow
> > Grassfields, Mbam-Nkam, Bamileke. Dialects: YEMBA, FOREKE DSCHANG
> > (DSCHANG, TCHANG). Part of a language continuum which includes Ngwe
> > and Ngyemboon. 15% to 25% literate.
> >
> > This data is in text form. I understand that an ethnologue client
> > processing XML queries and returning XML data is planned.
> >
> > Steven Bird
> >
> > --
> > [log in to unmask] http://www.ldc.upenn.edu/sb
> > Assoc Director, LDC; Adj Assoc Prof, CIS & Linguistics
> > Linguistic Data Consortium, University of Pennsylvania
> > 3615 Market St, Suite 200, Philadelphia, PA 19104-2608
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|