Finding clusters of locations on maps
28.2.2006 / Uncategorized /
So here’s a problem I faced while trying to build a little map application:
Say I have the locations (both in terms of street addresses and latitude/longitude) of various stores in a given vicinity. E.g. All Frys, Circuit Citys, Best Buys etc in my location. Now I want to find clusters where all of them are close to each other. So, for example, I want to find a cluster where a Frys store, Circuit City and Best Buy are all “close” to each other. Why? So that I can go visit all of them at once! I’m willing to travel some extra distance away from my location if I can find such a cluster. What’s an efficient algorithm to find such clusters?

Here’s a possible algorithm:
(1) Let S1, S2… Sn be the sets of locations of each shop. Pick the smallest one. For every shop s that has location (x, y) in this set:
(2) Find shops in each of the other Si that are in the location range (x +/- delta, y +/- delta), where delta is the threshold for determining what’s inside your cluster. These shops go into the “current cluster”.
Let’s analyze the runtime of this algorithm. For simplicity, assume that there are m sets of shops (S1… Sm) and that each of the sets has n elements.
Sort each of the sets both on x and y. Each can be sorted in O(n log n) time, and there are m of them, so this step takes O (mn log n)
Step (2) above can be done with a binary search (that’s why I sorted the sets first) in O(log n) time, and it needs to be repeated n times, so this step takes O(n log n) time.
So our total running time is O(mn log n + n log n) = O(mn log n).
The numbers of stores of which we’re looking for clusters, m, is usually a small constant less than 5. So we basically have an nlogn algorithm for doing this.
Does anyone know of a faster way of doing this?
1.3.2006 at 6:40 pm
[…] « Finding clusters of locations on maps […]
7.5.2006 at 7:08 am
Am I mistaken, or does this look like the travelling salesman problem all over again?
7.5.2006 at 11:20 am
it’s note quite TSP. You don’t have to visit every store location, you just need to visit every brand of store. Still, I agree that the formulation of the problem is not quite right, since it ignores the order that the stores in the cluster are visited. It seems like a more correct formulation would be
Given a set of tuples (X, Y, store), and a starting location (X, Y), find a cycle from start that includes a tuple with every mentioned store that minimizes the cycle length. Since this is a generalization of euclidian TSP (it’s exactly euclidean TSP if each store has only one location), and euclidian TSP seems to be NP-complete (http://portal.acm.org/citation.cfm?id=803626&dl=ACM&coll=portal), good luck finding any efficient optimal algorithm ;).