Hello to all!!!
I'm new to this list and recently have finished a Computer Science course. I
have a doubt about a work I need to conclude, so I'll describe the data I'm
analyzing.
There are many sample data, each one containing LON, LAT and PREC, which are
longitude, latitude and precipitation values. As the precipitation values
are prone to erronous measurements, sometimes there are points (LON, LAT)
which have for example the precipitation value equals to 2.43 and another
point that is the closest one with a value of 70.45. The distance I'm using
is a Euclidian distance using the angles and arcs (not a spherical triangle
because I didn't understand how to demonstrate the formulas for the distance
between two points over a sphere surface, if someone could show me then I
could use it later and I'd be very grateful).
The problem is that these values are daily data and I think it would be
important to do some statistical study and try to model the distribution
behavior and "possible failures" due to wrong measured values. I saw a
theory about Outliers but it does not seem to provide good results because
it depends on the distribution's skewness, what is something that I don't
know.
I'd like to know how I could use the distance between the points to analyze
the precipitation values (for instance, the N closest points - and what
value could be this "N"). I have implemented the Closest Pair algorithm
which is the faster one when finding closest points (its order of magnitude
is n*log(n) ) and I used it because there are hundreds of thousands of
points.
Thanks for the attention,
Looking forward to hearing from you.
--
Henrique
|