Clustering
Clustering groups similar rows together to reveal natural segments (e.g., types of customers or respondent profiles). It works well for survey and behavioral datasets where you have a mix of Numbers, Single/Multi‑Category, and Opinion Scales.
Open the tool
- Click the More menu.
- Choose Create clusters.
This opens a panel where you pick the columns to include, choose an algorithm, and run clustering. After reviewing the results, you can add a calculated cluster column to your dataset.
How it works (quick version)
- Mixed (recommended): Finds similarity across mixed data types without heavy prep. Great for surveys with categories and numbers.
- K‑means (numeric/ordinal only): Classic algorithm for purely numeric or ordinal data when you want a fixed number of clusters.
Both approaches look for groups of rows that are more similar to each other than to the rest of the dataset. In mixed mode, AddMaple uses a distance that can compare numeric and categorical answers together.
Step 1 — Select columns
Choose the columns you want to use for clustering. You can include:
- Numbers and Opinion Scales
- Single Category and Multi Category
Tip: Include a balanced mix of behavioral and attitudinal variables. Remove obvious duplicates to avoid overweighting the same idea twice.
Step 2 — Choose algorithm
- Mixed (recommended): Works with Numbers, Single Category, and Multi Category. Best default for survey data.
- K‑means: Requires numeric/ordinal inputs only. When selected, incompatible columns are dropped automatically.
Advanced options (optional)
- Number of clusters: In mixed mode, use Auto (recommended) or fix a number. In K‑means, you must pick a fixed number.
- Max clusters (auto mode): Upper bound when the number of clusters is set to Auto.
- Min cluster size: Minimum rows required to form a cluster. Higher values produce more stable groups.
You can reset to recommended settings anytime.
Run and review
Click Run Clustering. You'll see:
- Detected clusters (and a possible "Noise" group for outliers)
- Each cluster's size and percent of rows
- Top features per cluster to help explain what makes the group distinct
How to read features:
- Numeric features show the cluster mean and a z‑score vs the dataset mean.
- Categorical features show the percent in the cluster and a lift (×) vs overall.
Name and save clusters
You may see suggested names and descriptions for clusters to speed up labeling. When you're happy, click Add Cluster Column to add a new Single Category column to your dataset with the cluster labels. You can rename values later.
Tips and limitations
- Unsupervised insight: Clusters describe patterns; they aren't "right or wrong". Try different column sets to see stable themes.
- Rare categories can be noisy: Consider combining very small groups or increasing Min cluster size.
- Mixed vs K‑means: Use Mixed for survey‑style data. Use K‑means when your inputs are all numeric/ordinal and you want a fixed K.
Availability: Clustering is limited to certain plans.