I feel I am rushing in where angels fear......
The selection of a sample from among 700 objects depends primarily
upon what you wish to discover, and what sort of basic assumptions
you make.
1) If you assume that all the objects are randomly taken from a
uniform population (i. e., an intro stat question set-up:( , then
selecting n items at random would do you just fine. How many is n?
You haven't said the quesiton to answer yet.
2) If you assume that there are inherent differences in the sample of
700 (for example, if you wish to know typical wage rate paid and you
know that some of the 700 are retail stores, and the rest are
manufacturers, you might wish to sample these two groups separately,
keeping track of which group you were getting info from in each
case.) then it may well benefit you to sample from the two groups, in
the same proportion as each group appears in the 700.
3) You also need to know what sort of question you wish to answer,
_before_ you collect the data. In fact, ask the question now to help
decide how to make the sample.
there are too many alternative developments at this point, to answer
your question in an email. We need to know what you want to do with
the sample (what questions you will ask of it), and what factors you
think might influence the data and your actions based on it.
['factors=selectable characteristics of the 700.] IN the process of
developing that question, and the factors, I suspect you will at
least half answer your own question. Such is statistics!
Now for question 2.
You ask for a model that will detect objects (businesses?) that are
not altogether forthright about their activities (and presumably tax
payments). Would we all had such a detector! The US Federal deficit
would be cut in half overnight if everyone reported and paid as much
taxes as the IRS thinks they should!
In your example, you would need to know how much business a
restaurant is doing, and how much you would expect it to do for the
size (number of tables) it is. Again, the US IRS has excellent
equations for predicting true business activity, but they may not
want to pas them out to everyone. National restaurant associations
probably can tell you how much business you should expect to have,
for a given size and location, and type of restaurant. Such would be
needed in order to work up business plans. I expect the same would
be true for other retail business firms as well.
Once you have the equations (model), you would need to put in the
indicators of activity for each firm involved, and look for large
deviations. How much deviation indicates erroneous reporting? that
would depend on the accuracy and precision of the model. You could
at least select the 3 with the largest (tax loss) deviations, and
look more closely at them.
Don't know if that approaches support for your solution, but I tried.
Jay
On Dec 16, 2005, at 4:00 PM, james brown wrote:
> Hello Dear
> I need to select among 700 objects a good
> representative sample. These
> objects
> could be residential houses, commercial buildings,
> trucks, etc.
> How to get a good sample size and select a set of
> objects that is very
> representative.
>
> The second part of my question is to find a
> statistical model in R that
> detects objects that are most
> likely used as their owners told the municipality. For
> example, if a
> restaurant is suppose
> to have 5 tables, we want to know that it doesn't have
> more. The goal
> is to have a model that
> flags such restaurant for inspection.
>
> Cheers, Dan
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
Jay Warner
Principal Scientist
Warner Consulting, Inc.
4444 North Green Bay Road
Racine, WI 53404-1216
USA
Ph: 262.634.9100
FAX: 262.681.1133
email: [log in to unmask]
web: www.a2q.com
The A2Q Method(tm) --- What do you want to improve today?
|