On Fri, 3 Sep 1999, Michael Massmann wrote:
> I have a vector with 40,000 observation and I am trying to find out
> whether it contains any sets of two (or more) equal elements.
the following function returns a matrix in out with two rows, where the
first row contains the unique elements in the input vector invec and the
second row contains the counts of the elements in the first row.
rpt02(const invec, const /*&*/out)
{
// row 0 is unique elements in vector
out[0] = unique(invec);
// row 1 is count of elements in row 0 (drop last count which is 0)
out[0] |= countc( invec, out[0] )[0:sizec(out[0])-1][]';
}
the above function returns only the counts. if you want the
indices/position of each repeat, you will have to call vecindex() for
each unique element in the vector.
> The problem is that both versions take ages to run on my machine (Pentium
> I, 166MHz). My time projections point to hours if not days!
my deepest condolences for having to use such an antiquated machine. just
to make you jealous, the program
main()
{
// fake data
decl n = 50000;
decl y = round( 1000*ranu(n,1) );
decl t0, elp; // for timing
decl cnt; // counter output
t0 = timer();
rpt02(y, &cnt);
elp = timespan(t0);
println("rpt02 = ", elp, " seconds");
// assert (counts must add up to obs)
if (sumr(cnt[1][]) != sizer(y)) {
println("assertion failed in rpt02");
}
}
takes about 3 seconds on my pentium II (333mhz).
h.
--------------------------------------------------------------------
Time series regression studies give no sign of converging toward the
truth. (Phillip Cagan)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|