Glossary term
Glossary term
Foundations
A clustering algorithm closely related to k-means. The practical difference between the two is as follows:
In k-means, centroids are determined by minimizing the sum of the squares of the distance between a centroid candidate and each of its examples.
In k-median, centroids are determined by minimizing the sum of the distance between a centroid candidate and each of its examples.
Note that the definitions of distance are also different:
k-means relies on the Euclidean distance from the centroid to an example. (In two dimensions, the Euclidean distance means using the Pythagorean theorem to calculate the hypotenuse.) For example, the k-means distance between (2,2) and (5,-2) would be:
k-median relies on the Manhattan distance from the centroid to an example. This distance is the sum of the absolute deltas in each dimension. For example, the k-median distance between (2,2) and (5,-2) would be:
L
Created for this library
A logistics team uses k-median as a robust alternative to k-means when delivery locations include extreme outliers that distort means.
A retail analytics team uses k-median on customer features when a small group of high-spenders would otherwise pull cluster centers unhelpfully.
A risk analytics team uses k-median to cluster portfolios because the median is less sensitive to single-account anomalies than the mean.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License