Add Maple

AddMaple V3 is here! Learn more

K-means & k-medoids clustering

Clustering is about grouping similar things.

K-means and k-medoids do this in different ways. You will see it step by step, with simple visuals and short explanations.

Two paths to grouping: averages vs real examples

One visual, two algorithms

See k-means and k-medoids side by side

Toggle the algorithm, change k, and step through the loop.

Step

Start

Centers are averages, so they can land between points.
K-means centers are averages. K-medoids centers are actual points.

Controls

What you should notice

  • • Points switch colors during assignment.
  • • Centers move during update.
  • • K-medoids keeps a real example at the center.
Assignment → update → repeat

The algorithm loop

Two repeating phases: assignment and update

Both algorithms follow the same rhythm. The difference is how the center updates.

Phase 1

Assignment

Each point is assigned to its nearest center. That's why colors shift first.

Phase 2

Update

K-means moves the center to the average. K-medoids picks the best representative point.

Callout

K-means is sensitive to outliers. K-medoids is more robust when one extreme point shows up.

When should you use each?

A quick decision guide

Use the method that matches your data and your tolerance for outliers.

K-means

  • • Numeric data only
  • • Distances are meaningful
  • • Faster and simpler
  • • Sensitive to outliers

K-medoids

  • • Works with arbitrary distances
  • • Centers are real points
  • • More robust to noise
  • • Slightly more expensive

Gower distance

Comparing mixed data, one feature at a time

Mixed datasets include numbers, categories, and rankings. Gower distance handles them by normalizing each feature and averaging the result.

RowAgePlan typeSatisfactionRenewalWeekly usage
A22Basic3Yes18
B45Pro5No44
C31Basic4YesMissing
D29Team2No25
E52Pro1Yes30
F37Team5Yes36

Pick two rows

Missing values are ignored when averaging the per-feature distances.

Per-feature contribution

Age22 vs 45

Distance contribution: 0.77

Plan typeBasic vs Pro

Distance contribution: 1.00

Satisfaction (1–5)3 vs 5

Distance contribution: 0.50

Renewal (yes/no)Yes vs No

Distance contribution: 1.00

Weekly usage18 vs 44

Distance contribution: 1.00

Final Gower distance

0.85 (between 0 and 1)

Gower distance lets us compare mixed data in a consistent way.

Putting it together

Why k-medoids pairs well with Gower distance

K-medoids only needs a distance function. Gower gives a sensible distance for mixed data.

K-medoids uses real points as centers, so it works with any distance metric.
Gower handles numbers, categories, rankings, and missing values in one score.
Together, they work well for surveys, customer profiles, and behavioral data.

Choosing k

Too few vs too many clusters

Start simple, then adjust until the groups feel meaningful and stable.

Elbow intuition

As k increases, clusters get tighter. Look for the point where improvements slow down.

Too few clusters: different behaviors get lumped together.

Too many clusters: every small variation becomes its own group.

A good k balances clarity with usefulness. If you can explain each group in one sentence, you are close.