Dear Allstatters,
I am looking for references or suggestions to help with this problem I am tackling.
I have a population of 60k individuals (i) who each possess J(i) items. These items are hard to keep track of and so whilst a few individuals know their value of J, most will have no idea and will have to guess what J is. My task is to estimate the total of J and its confidence interval.
Each individual has submitted their estimate of J (they have a financial incentive to do so) but their estimates are made by choosing from a list of numerical bands e.g. 1 to 4, 5 to 9, etc so I have a data set [J)(i) where [...) represents a band for J. There are only 6 bands though and the top band is 500 or more i.e. unbounded. The median estimate is the lowest band (1 to 19) and up to 5% of individuals estimate they are in the top band i.e. >500 items.
My intended approach was to undertake Kernel Density Estimation to derive a probable distribution for J and then sum that up. Assisting me in this endeavour is a convenience sample (i.e. very non-random) which gives partial counts j(i) i.e. j(i) <= J(i). This sample currently gives partial counts for 80% of the individuals and in some cases j(i) = [J}(i) i.e. j is in the estimated band of J but many j(i) are in lower bands. Indeed some individuals have underestimated and j(i) > [J)(i). The mean of j is 75 but the sample is highly skewed with a median value for j of 3 and a 99% percentile of 1300 and a maximum of 100,000! Whilst only 2% of individuals have a partial count j > 500, they account for 70% of the total of j.
The extreme skewness of the partial count data j makes me uncomfortable. I am concerned that my kernel density estimate will significantly underestimate the tail of the distribution and it feels like I should be looking at extreme value analysis instead. However, I have little experience of EVA and what I have read so far has not been helpful in terms of helping me ensure the tail of my distribution is reasonable. So I would really appreciate it if someone could guide me to references that would help with this issue.
Thank you in advance.
Regards
Nigel Marriott CStat
Independent Statistician
Find me on Twitter<https://twitter.com/MarriottNigel> Read my Blog<https://marriott-stats.com/nigels-blog/>
[log in to unmask]<mailto:[log in to unmask]> T: +44 (0) 1225 489 033
www.marriott-stats.com<http://www.marriott-stats.com/> M: +44 (0) 7734 069 997
Registered in England, Company No. 5577275, VAT No. 883304029.
Registered Office - 2 Temple Street, Keynsham, Bristol, BS31 1EG
------
This email has been scanned for spam and malware by The Email Laundry.
You may leave the list at any time by sending the command
SIGNOFF allstat
to [log in to unmask], leaving the subject line blank.
|