From: Jon Eirik Jonsson Subject: Re: data clusters in 3-space Date: Wed, 22 Sep 1999 12:02:50 -0400 Newsgroups: sci.math Keywords: multidimensional scaling, cluster algorithms Chris McCue wrote: > > Hello- > > First off, I'm sorry if this is the wrong newsgroup for this question. > If so, I would appreciate a pointer to a more appropriate forum. > > As a part of my research work, I generate sets of points in 3-space. > Is there a known good way to determine any 3-space values that these > data points cluster around? The ideal solution that I'm hoping for > would give me both these cluster points and some kind of measure of > how good they are, so that I could say "OK, find all of the cluster > points that have a metric of less than 0.1 (or whatever)". > > It's probably also important that the method be time efficient, as my > data sets regularly contain several million points. > > Thanks in advance for any help that you can provide. > > -Chris Try multidimensional scaling (MDS). A good overview is by Forest Young (ed.) "Foundations of Multidimensional Scaling" (? title from memory; published around 1988). Another good overview which gets into the issue of both satisfying metric constraints and clustering is in Roger Shepard's "Science" article, published around 1980. MDS will represent your data in an n-dimensional metric space. There are programs available which perform both scaling and clustering analyses. These are widely available and computationally efficient, although several million data might present problems :-0 Working with a matrix of that size will be difficult in this context. Can you reduce it? What are the data? - Jon ============================================================================== From: Michael Hochster Subject: Re: data clusters in 3-space Date: Thu, 23 Sep 1999 12:36:31 GMT Newsgroups: sci.math There is a large body of research on this sort of problem. For a basic introduction, try "Finding Groups in Data," by Kaufman & Rousseeuw. From there you can find more advanced references if you need them. The best approach will depend on exactly what you mean by a "cluster" (e.g., is a long skinny cigar-shaped thing a cluster?) and what the "representative" points are used for. You might get also get useful responses from sci.stat.math. Good luck, Mike Sent via Deja.com http://www.deja.com/ Share what you know. Learn what you don't.