Dear All,
I am working on the classification of a particular kind of tumour. The
severity of these tumours is rated on a continous scale from 0-100.
0 being extremely severe and 100 being very mild. The severity of the
tumour is then coded into
A1 0-10
A2 10-20
A3 20-30
A4 30-40
A5 40-50
A6 50-60
A7 60-70
A8 70-80
A9 80-90
A10 90-100
The severity of a tumour is very difficult to determine by direct inspection
of it but three variables X, Y and Z (continuous) have been identified that
are good predictors of it.
In the past we have used linear regression models of the form
severity = a + bx + cy + dz + error
(a,b,c,d are the model parameters estimated by least squares)
and then recoded the fitted values to one of the categories above.
We have fitted this model on large datasets and tested it's performance on
a hold-out sample.
If the predicted severity grade is within two grades of the actual grade then
that is considered a correct classification, i.e. if the true grade is A6 then
if the model predicts A4, A5, A6, A7, A8 then that is considered correct.
What we have found in the past is that this model predicts about 90% of
tumours
correctly which is more than satisfactory.
My problem is that I want to create a model for a different kind of tumour.
However, this new tumour is more rare than the ones we have looked at in the
past.
The data set we have available is distributed as follows.
A1 1
A2 3
A3 8
A4 10
A5 58
A6 20
A7 90
A8 12
A9 10
A10 9
I have fitted a model as I have in the past and find I get good prediction in
the middle of the range, i.e. A5-A8 but I am having difficulty identify
tumours at either end of the spectrum.
Someone has suggested weighted regression by giving heavier weights to the
grades with less observations. This does improve the performance of the model
throughout the range slightly but I am unsure as to whether this is a feasible
method.
I am interested to know other peoples thoughts on this. Is weighted
regression
suitable? How do I choose my weights? Is there any literature available on
weighting in situations like this?
And if weighted regression is incorrect what other methods could I consider
using.
All opinions welcome and I will post back a summary of any interesting
replies.
Regards
Mark
Totalise - the Users ISP
----------------------
To become a member and a shareholder
visit http://www.totalise.net
|