pdist: various problems

Francesco Potortì Potorti at isti.cnr.it
Tue Nov 11 14:40:51 CST 2008


I am stydying clustering and started from the pdist function in the
statistics package.

I see that it has some problems.  For one, the seuclidean, mahalanobis,
minkowski, cosine, correlation and spearman distances give various
errors when tried on the vector [1 3; 2 3; 4 2; 4 5; 5 2].

Second, while the euclidean, cityblock and chebychev distances appear to
work, the hamming and jaccard distances apparently give wrong results.

But the most significant problem is that the method it uses, as far as I
can see, cannot work for seuclidean, mahalanobis, minkowski, cosine,
correlation, that is, for those where the distance of two vectors
depends on all the other vectors.

I am almost finished rewriting pdist in a saner way, aiming at being
compatible with Matlab's statistics package.  I think that it should go
into the main Octave distribution, what do people think?

Are there here people that can check it against Matlab's?  I am going to
write the test suite, but I need to check it againts Matlab and I do not
have it.

-- 
Francesco Potortì (ricercatore)        Voice: +39 050 315 3058 (op.2111)
ISTI - Area della ricerca CNR          Fax:   +39 050 315 2040
via G. Moruzzi 1, I-56124 Pisa         Email: Potorti at isti.cnr.it
(entrance 20, 1st floor, room C71)     Web:   http://fly.isti.cnr.it/


More information about the Bug-octave mailing list