Dear List members
Can anyone help this student with his problem?
M Graham
List Owner
-----Original Message-----
From: Chaker Jabbari [mailto:[log in to unmask]]
Sent: 05 January 2005 11:55
To: [log in to unmask]
Subject: chaker : phd student question
Dear Mr/Mrs.
I'm a tunisian phd student. i'm actually a lecturer in king saud university
(saudi arabia).
my research is about html document type identification.
at this moment i have identify 12 document types :
dictionary, patent, book, thesis, memory, report, paper, call for papers,
faq, web page, news, email.
to identify the document type i have use three criterions :
1- the size in number of words
2- the document logical structure that will be extracted using html tags
(<Hn>).
3- linguistic expressions (like : this paper, the following thesis, ...)
but i have a problem to experiment my approach.
i haven't a collection of html documents belonging to all of these 12
types.
i try with collections like (spirit, ...). but in these collections we
cannot find a complet html document that describe a thesis or a book a
dictionary.
i need some idea or suggestions
thank you very much
chaker jebari
--
This message has been scanned for viruses and dangerous content by the
NorMAN MailScanner Service and is believed to be clean.
The NorMAN MailScanner Service is operated by Information Systems and
Services, University of Newcastle upon Tyne.
====
This e-mail is intended solely for the addressee. It may contain private and
confidential information. If you are not the intended addressee, please take
no action based on it nor show a copy to anyone. Please reply to this e-mail
to highlight the error. You should also be aware that all electronic mail
from, to, or within Northumbria University may be the subject of a request
under the Freedom of Information Act 2000 and related legislation, and
therefore may be required to be disclosed to third parties.
This e-mail and attachments have been scanned for viruses prior to leaving
Northumbria University. Northumbria University will not be liable for any
losses as a result of any viruses being passed on.
|