Add Maple

Clustering

Clustering groups similar rows together to reveal natural segments (e.g., types of customers or respondent profiles). It works well for survey and behavioral datasets where you have a mix of Numbers, Single/Multi‑Category, and Opinion Scales.

Open the tool

  1. Click the More menu.
  2. Choose Create clusters.

This opens a panel where you pick the columns to include, choose an algorithm, and run clustering. After reviewing the results, you can add a calculated cluster column to your dataset.

How it works (quick version)

  • Mixed (recommended): Finds similarity across mixed data types without heavy prep. Great for surveys with categories and numbers.
  • K‑means (numeric/ordinal only): Classic algorithm for purely numeric or ordinal data when you want a fixed number of clusters.

Both approaches look for groups of rows that are more similar to each other than to the rest of the dataset. In mixed mode, AddMaple uses a distance that can compare numeric and categorical answers together.

Step 1 — Select columns

Choose the columns you want to use for clustering. You can include:

  • Numbers and Opinion Scales
  • Single Category and Multi Category

Tip: Include a balanced mix of behavioral and attitudinal variables. Remove obvious duplicates to avoid overweighting the same idea twice.

Step 2 — Choose algorithm

  • Mixed (recommended): Works with Numbers, Single Category, and Multi Category. Best default for survey data.
  • K‑means: Requires numeric/ordinal inputs only. When selected, incompatible columns are dropped automatically.

Advanced options (optional)

  • Number of clusters: In mixed mode, use Auto (recommended) or fix a number. In K‑means, you must pick a fixed number.
  • Max clusters (auto mode): Upper bound when the number of clusters is set to Auto.
  • Min cluster size: Minimum rows required to form a cluster. Higher values produce more stable groups.

You can reset to recommended settings anytime.

Run and review

Click Run Clustering. You'll see:

  • Detected clusters (and a possible "Noise" group for outliers)
  • Each cluster's size and percent of rows
  • Top features per cluster to help explain what makes the group distinct

How to read features:

  • Numeric features show the cluster mean and a z‑score vs the dataset mean.
  • Categorical features show the percent in the cluster and a lift (×) vs overall.

Name and save clusters

You may see suggested names and descriptions for clusters to speed up labeling. When you're happy, click Add Cluster Column to add a new Single Category column to your dataset with the cluster labels. You can rename values later.

Tips and limitations

  • Unsupervised insight: Clusters describe patterns; they aren't "right or wrong". Try different column sets to see stable themes.
  • Rare categories can be noisy: Consider combining very small groups or increasing Min cluster size.
  • Mixed vs K‑means: Use Mixed for survey‑style data. Use K‑means when your inputs are all numeric/ordinal and you want a fixed K.

Availability: Clustering is limited to certain plans.