Andy Powell wrote:
> I'd therefore be tempted to re-ask your question in a slightly different,
> two-part, form:
>
> 1) is there any evidence that the value of manually assigning subject
> classifications to open access scholarly publications improves scholarly
> communication sufficiently over full-text indexing approaches to outweigh
> the costs of doing so? (My answer: almost certainly not).
While I mostly agree that the focus of effort should be on automatic
classification, I think arXiv serves as an example of the use of a
manual classification which has high value. The author-supplied
classifications have driven alerting for 17 years now and I think have
been important in acceptance of arXiv through the fostering of a sense of
community.
We also use the author-supplied classification to direct new submissions
to appropriate moderators. We are currently experimenting with the use
of an automatic classifier to alert administrators to possible
mis-classifications, and later to suggest classifications to submitters. Our
(positive) experience from extraction of articles from the existing corpus to
seed the quantitative biology category (q-bio) was positive and is described
in http://arxiv.org/abs/cs/0312018 . It may be that at some time arXiv could
do away with the manual classification but it may have lasting value in
community building, in providing the user with a sense of agency, and as a
double-check.
One aspect of automatic classification we should not forget is that one
can rerun over the whole corpus at any time -- something simply impractical
with manual schemes. Thus it can be expected to cope with changes in a
classification scheme as subjects evolve, or provide different views for
different user communities (given an appropriate training set).
Cheers,
Simeon
|